Workshops and Tutorials

Workshop: CPATH (Full Day)

Integrating Parallelism Throughout the Undergraduate Computing Curriculum (CPATH)

Saturday, February 12, 8:30-5:00, Room 13&14

Sam Midkiff, Vijay Pai, Deborah Bennett
Purdue University (external link)

The workshop will focus on integrating parallelism in both hardware and software throughout the undergraduate and graduate computing curriculum, and the affect of industrial trends on the undergraduate computing curriculum. This workshop will also present, discuss, and disseminate preliminary findings of data collected and analyzed as a part of a National Science Foundation sponsored CPATH grant awarded to Purdue University. The grant seeks to support the development of a curriculum model of teaching parallelism in both hardware and software computing courses beginning at the first year. The CPATH grant will support registration, lodging, and limited travel for a limited number of participants for the workshop.

This workshop seeks to present and discuss educational issues related to integrating parallelism into the undergraduate and graduate computing curriculum. This workshop will involve researchers from industry and universities discussing: (1) what aspects of parallelism are relevant to the undergraduate and graduate curriculum, and (2) effective ways of integrating parallelism into the undergraduate and graduate computing curriculum.

The workshop as planned will feature a keynote, talks, panels, and a moderated discussion on topics of parallelism in the undergraduate computing curriculum, and the affect of industrial trends on the computing curriculum.

Tutorial: PPGA (Full Day)

Parallel Programming for Graph Analysis

Saturday, February 12, 8:30-5:00, Room 15

David A. Bader, David Ediger, Jason Riedy
Georgia Institute of Technology (external link)

An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized datasets. At the same time, terabytes of new, unstructured data arrive daily. As the desire to ask more detailed questions about these massive streams has grown, parallel software and hardware have only recently begun to enable complex analytics in this non-scientific space.

In this tutorial, we will discuss the open problems facing us with analyzing this "data deluge". We will present algorithms and data structures capable of analyzing spatio-temporal data at massive scale on parallel systems. We will try to understand the difficulties and bottlenecks in parallel graph algorithm design on current systems and will show how multithreaded and hybrid systems can overcome these challenges. We will demonstrate how parallel graph algorithms can be implemented on a variety of architectures using different programming models.

The goal of this tutorial is to provide a comprehensive introduction to the field of parallel graph analysis to an audience with computing background, interested in participating in research and/or commercial applications of this field. Moreover, we will cover leading-edge technical and algorithmic developments in the field and discuss open problems and potential solutions.

This tutorial has been cancelled.

Tutorial: X10 (Full Day)

X10 Tutorial

Sunday, February 13, 8:30-5:00, Room 15

Igor Peshansky, Bard Bloom, Vijay Saraswat
IBM Research (external link)

This full-day tutorial aims to introduce the essence of the X10 programming language and model through lectures and hands-on experience. X10 is a novel programming language from IBM Research for making parallel and distributed programming intuitive and less error-prone. It is developed as an ecosystem of tools and libraries that make it easy to start writing parallel programs without limiting expert usage when necessary.

Participants will be able to install X10 and its accompanying Eclipse development environment, X10DT, on their laptops, and will explore, debug, and adapt a non-trivial parallel/distributed heterogeneous application written in X10. We will show how to profile and optimize code written in X10, and explain the X10 compilation and optimization strategy, including optimizing both massively parallel and distributed applications.

If time permits, we will also relay prior experiences from using X10 to learn and teach parallel and distributed programming.

Tutorial: ICC-Cilk (Half Day)

Parallel Programming with Cilk and Array Notation using the Intel Compiler

Sunday, February 13, 8:30-12:00, Room 15

Chi-Keung Luk, Peng Tu, William Leiserson
Intel Corporation (external link)

The next release of the Intel Compiler (ICC v12) supports two new extensions to C/C++ for parallel programming: Cilk and Array Notation, collectively called CilkPlus. Cilk provides an easy way to convert a sequential program into a multithreaded program, exploiting the thread-level parallelism available on multicore machines. On the other hand, Array Notation is an expressive method to vectorize a computational kernel in order to exploit the SIMD parallelism within each core.

This tutorial consists of three parts. Part I is an introduction to Cilk. Part II is an introduction to Array Notation. Part III presents a programming methodology based on cache-oblivious techniques which uses both Cilk and Array Notation to achieve high performance.

Tutorial: Pilot (Half Day)

Using the Pilot Library

Sunday, February 13, 1:00-5:00, Room 13&14

Bill Gardner
School of Computer Science, University of Guelph, Ontario (external link)

Pilot is a new way to program high-performance clusters in C and Fortran based on a high-level model featuring processes executing on cluster nodes, and channels for passing messages among them. Designed to smooth the learning curve for novice scientific programmers, the set of library functions is small — less than one-tenth that of MPI — and easy to learn, since the C syntax mirrors well-known printf and scanf.

The process/channel abstraction inherently reduces the opportunities for communication errors that result in deadlock, and an integrated runtime mechanism detects and diagnoses deadlocks arising from circular waiting. The Pilot library is built as a transparent layer on top of conventional MPI, and shields users from the latter’s complexity while adding minimal overhead. This tutorial assumes basic exposure to C or Fortran programming. Familiarity with MPI is not required, but will make the comparisons more meaningful.

Tutorial: GPPMS (Half Day)

Ingredients for good parallel performance on multicore-based systems

Sunday, February 13, 1:00-5:00, Room 15

Georg Hager
Erlangen Regional Computing Center, Germany (external link)

This tutorial covers program optimization techniques for multicore processors and the systems they are used in. It concentrates on the dominating parallel programming paradigms, MPI and OpenMP. We start by giving an architectural overview of multicore processors. Peculiarities like shared vs. separate caches, bandwidth bottlenecks, and ccNUMA characteristics are pointed out. We show typical performance features like synchronization overhead, intranode MPI bandwidths and latencies, ccNUMA locality, and bandwidth saturation (in cache and memory) in order to pinpoint the influence of system topology and thread affinity on the performance of typical parallel programming constructs. Multiple ways of probing system topology and establishing affinity, either by explicit coding or separate tools, are demonstrated. Finally we elaborate on programming techniques that help establish optimal parallel memory access patterns and/or cache reuse, with an emphasis on leveraging shared caches for improving performance.

Detailed Schedule

Saturday 12 Sunday 13
Room 13 & 14 Room 15 Room 15 Room 13 & 14
7:30 - 8:30 Breakfast
8:30 - 10:00 W CPATH T PPGA T ICC-Cilk
10:00 - 10:30 Coffee Break
10:30 - 12:00 W CPATH T PPGA T ICC-Cilk
12:00 - 13:00 Lunch
13:00 - 15:00 W CPATH T PPGA T GPPMS T Pilot
15:00 - 15:30 Coffee Break
15:30 - 17:00 W CPATH T PPGA T GPPMS T Pilot

Room maps

Saturday Program Room Map:

Sunday Program Room Map: