Blogs (9) >>
SPLASH 2016
Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands
Wed 2 Nov 2016 11:20 - 11:45 at Matterhorn 1 - Optimization and Performance Chair(s): Jan Vitek

Despite the growing popularity of GPGPU programming, there is not yet
a portable and formally-specified barrier that one can use to
synchronise across workgroups. Moreover, the occupancy-bound execution
model of GPUs breaks assumptions inherent in traditional software
execution barriers, exposing them to deadlock. We present an
occupancy discovery protocol that dynamically discovers a safe
estimate of the occupancy for a given GPU and kernel, allowing for a
starvation-free (and hence, deadlock-free) inter-workgroup barrier by
restricting the number of workgroups according to this estimate. We
implement this idea by adapting an existing, previously non-portable,
GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and
prove that the barrier meets its natural specification in terms of
synchronisation.

We assess the portability of our approach over eight GPUs spanning
four vendors, comparing the performance of our method against
alternative methods. Our key findings include: (1)~the recall of our
discovery protocol is nearly 100%; (2)~runtime comparisons vary
substantially across GPUs and applications; and (3)~our method
provides portable and safe inter-workgroup synchronisation across the
applications we study.

Wed 2 Nov

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:10
Optimization and PerformanceOOPSLA at Matterhorn 1
Chair(s): Jan Vitek Northeastern University
10:30
25m
Talk
A Compiler for Throughput Optimization of Graph Algorithms on GPUsAEC
OOPSLA
Sreepathi Pai University of Texas at Austin, USA, Keshav Pingali University of Texas at Austin, USA
DOI Pre-print
10:55
25m
Talk
Automatic Parallelization of Pure Method Calls via Conditional Future Synthesis
OOPSLA
Rishi Surendran Rice University, USA, Vivek Sarkar Rice University, USA
DOI
11:20
25m
Talk
Portable Inter-workgroup Barrier Synchronisation for GPUsAEC
OOPSLA
Tyler Sorensen Imperial College London, Alastair F. Donaldson Imperial College London, Mark Batty University of Kent, Ganesh Gopalakrishnan University of Utah, Zvonimir Rakamaric University of Utah
DOI Pre-print
11:45
25m
Talk
Parallel Incremental Whole-Program Optimizations for Scala.js
OOPSLA
Sébastien Doeraene EPFL, Switzerland, Tobias Schlatter EPFL, Switzerland
DOI Pre-print