Portable Inter-workgroup Barrier Synchronisation for GPUs (SPLASH 2016 - OOPSLA)

Blogs (9) >>

Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands

Who

Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, Zvonimir Rakamaric

Track

SPLASH 2016 OOPSLA

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 2 Nov 2016 11:20 - 11:45 at Matterhorn 1 - Optimization and Performance Chair(s): Jan Vitek

Abstract

Despite the growing popularity of GPGPU programming, there is not yet
a portable and formally-specified barrier that one can use to
synchronise across workgroups. Moreover, the occupancy-bound execution
model of GPUs breaks assumptions inherent in traditional software
execution barriers, exposing them to deadlock. We present an
occupancy discovery protocol that dynamically discovers a safe
estimate of the occupancy for a given GPU and kernel, allowing for a
starvation-free (and hence, deadlock-free) inter-workgroup barrier by
restricting the number of workgroups according to this estimate. We
implement this idea by adapting an existing, previously non-portable,
GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and
prove that the barrier meets its natural specification in terms of
synchronisation.

We assess the portability of our approach over eight GPUs spanning
four vendors, comparing the performance of our method against
alternative methods. Our key findings include: (1)~the recall of our
discovery protocol is nearly 100%; (2)~runtime comparisons vary
substantially across GPUs and applications; and (3)~our method
provides portable and safe inter-workgroup synchronisation across the
applications we study.

Link to Preprint

https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2016/OOPSLA.pdf

DOI

https://doi.org/10.1145/2983990.2984032

Tyler Sorensen

Imperial College London

United Kingdom

Alastair F. Donaldson

Imperial College London

United Kingdom

Mark Batty

University of Kent

Ganesh Gopalakrishnan

University of Utah

Zvonimir Rakamaric