Blogs (9) >>
Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands
Wed 2 Nov 2016 10:30 - 10:55 at Matterhorn 1 - Optimization and Performance Chair(s): Jan Vitek

Writing high-performance GPU implementations of graph algorithms
can be challenging. In this paper, we argue that three optimizations
called throughput optimizations are key to high-performance
for this application class.
These optimizations describe a large implementation space making it unrealistic for programmers to implement them by hand.

To address this problem, we have implemented these optimizations in a compiler that
produces CUDA code from an intermediate-level program representation
called IrGL.
Compared to state-of-the-art handwritten CUDA implementations of eight graph applications,
code generated by the IrGL compiler is up to 5.95x times faster (median 1.4x) for five applications and never
more than 30% slower for the others. Throughput optimizations contribute an improvement
up to 4.16x (median 1.4x) to the performance of unoptimized IrGL code.