Using Compiler Snippets to Exploit Parallelism on Heterogeneous Hardware: A Java Reduction Case Study (VMIL 2018 - Virtual Machines and Language Implementations)

Sun 4 - Fri 9 November 2018 Boston, Massachusetts, United States

Who

Juan Fumero, Christos Kotselidis

Track

VMIL 2018

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 4 Nov 2018 10:55 - 11:20 at Stuart - I Chair(s): Mark Marron

Abstract

Parallel skeletons are essential structured design patterns for efficient heterogeneous and parallel programming. They allow programmers to express common algorithms in such a way that it is much easier to read, maintain, debug and implement for different parallel programming models and parallel architectures. Reductions are one of the most common parallel skeletons. Many programming frameworks have been proposed for accelerating reduction operations on heterogeneous hardware. However, for the Java programming language, little work has been done for automatically compiling and exploiting reductions in Java applications on GPUs.

In this paper we present our work in progress in utilizing compiler snippets to express parallelism on heterogeneous hardware. In detail, we demonstrate the usage of Graal’s snippets, in the context of the Tornado compiler, to express a set of Java reduction operations for GPU acceleration. The snippets are expressed in pure Java with OpenCL semantics, simplifying the JIT compiler optimizations and code generation. We showcase that with our technique we are able to execute a predefined set of reductions on GPUs within 85% of the performance of the native code and reach up to 20x over the Java sequential execution.

Link to Preprint

https://www.researchgate.net/publication/327871451_Using_Compiler_Snippets_to_Exploit_Parallelism_on_Heterogeneous_Hardware_A_Java_Reduction_Case_Study

DOI

https://doi.org/10.1145/3281287.3281292

Juan Fumero

The University of Manchester

United Kingdom

Christos Kotselidis