Program Generation for Linear Algebra Using Multiple Layers of DSLs (DSLDI 2016)

Blogs (9) >>

Sun 30 October - Fri 4 November 2016 Amsterdam, Netherlands

Who

Daniele G. Spampinato, Diego Fabregat-Traver, Markus Püschel, Paolo Bientinesi

Track

DSLDI 2016

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 31 Oct 2016 16:05 - 16:30 at Matterhorn 1 - Session 2

Abstract

Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK. Examples of such blocks include, but are not limited to, matrix multiplications, matrix factorizations, and solvers for Sylvester-like equations. While the BLAS and LAPACK libraries have been very successful in providing portable performance for a wide range of computing architectures, they still present severe limitations in terms of flexibility. First, these libraries are optimized for large matrices (of sizes at least in the hundreds). Second, the interface in terms of operations and matrix structures they provide specifically targets computational science. These limitations can render those libraries suboptimal in performance or code size for applications in communications, graphics, and control, which may require smaller scale computations and a more flexible interface. To overcome these limitations, we advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture. In this work, we introduce such a generator that translates a desired linear algebra computation, annotated with matrix properties, into optimized C code, optionally vectorized with intrinsics. The generator unites prior work on two independent frameworks: The FLAME-based CL1CK and LGen, which was designed after Spiral. For a given linear algebra problem such as a matrix factorization, matrix inversion, or equation to be solved, CL1CK synthesizes families of blocked algorithms that rely on basic computations provided by BLAS. These, in turn, are compiled into efficient, vectorized C code by (an extension of) LGen. As case studies, we consider the Cholesky decomposition, and solvers for the continuous-time Lyapunov and Sylvester equations. We compare the performance of our generated code with the commercial Intel MKL showing competitive results.

Daniele G. Spampinato

ETH Zurich

Diego Fabregat-Traver

RWTH Aachen

Markus Püschel

ETH Zurich

Switzerland

Paolo Bientinesi