Blogs (9) >>
Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands
Wed 2 Nov 2016 10:30 - 10:55 at Matterhorn 1 - Optimization and Performance Chair(s): Jan Vitek

Writing high-performance GPU implementations of graph algorithms
can be challenging. In this paper, we argue that three optimizations
called throughput optimizations are key to high-performance
for this application class.
These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand.

To address this problem, we have implemented these optimizations in a compiler that
produces CUDA code from an intermediate-level program representation
called IrGL.
Compared to state-of-the-art handwritten CUDA implementations of eight graph applications,
code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never
more than 30% slower for the others. Throughput optimizations contribute an improvement
up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.

Wed 2 Nov

10:30 - 12:10: OOPSLA - Optimization and Performance at Matterhorn 1
Chair(s): Jan VitekNortheastern University
splash-2016-oopsla147807900000010:30 - 10:55
Sreepathi PaiUniversity of Texas at Austin, USA, Keshav PingaliUniversity of Texas at Austin, USA
DOI Pre-print
splash-2016-oopsla147808050000010:55 - 11:20
Rishi SurendranRice University, USA, Vivek SarkarRice University, USA
splash-2016-oopsla147808200000011:20 - 11:45
Tyler SorensenImperial College London, Alastair DonaldsonImperial College London, Mark BattyUniversity of Kent, Ganesh GopalakrishnanUniversity of Utah, Zvonimir RakamaricUniversity of Utah
DOI Pre-print
splash-2016-oopsla147808350000011:45 - 12:10
Sébastien DoeraeneEPFL, Switzerland, Tobias SchlatterEPFL, Switzerland
DOI Pre-print