BoF: Analyzing Parallel I/O

Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These challenges call for sophisticated tools to capture, analyze, understand, and tune application I/O.

In this BoF, we will highlight recent advances in monitoring tools to help address this problem. We will also encourage community discussion to compare best practices, identify gaps in measurement and analysis, and find ways to translate parallel I/O analysis into actionable outcomes for users, facility operators, and researchers.

The BoF is held in conjunction with the Supercomputing conference. The schedule is listed here.

Date Wednesday Nov. 16th, 2016
Time 17:15-19:00
Venue Room 155-E, Salt Lake City, USA

Organization

The BoF is organized by

Agenda

Our BoF summary

We have a series of 8+2 (min) talks followed by a longer discussion:

  • Introduction – Phil Carns – Slides
  • What's new with Darshan? – Shane Snyder – Slides
  • Characterizing burst buffers at extreme scale using the TOKIO framework – Glenn Lockwood – Slides
  • Characterizing Parallel I/O Behaviour Based on Server-Side I/O Counters – Salem El. Sayed – Slides
  • SIOX In-Situ Optimization and Virtual Laboratory – Jakob Lüttgau – Slides
  • Statistical File Characterization / Status Update Monitoring at DKRZ – Julian Kunkel – Slides
  • Mining Supercomputer Jobs' I/O Behavior from System Logs – Xiaosong Ma – Slides
  • Discussion (30 minutes)

Speakers

  • Phil Carns is a principal software development specialist in the Mathematics and Computer Science division of ANL. He is the technical lead of the Darshan project and a key contributor to a variety of related storage simulation and prototyping activities.
  • Julian Kunkel is responsible for several projects in the research division at DKRZ. He has been working on tracing environments and tools for client and server-side I/O since many years.
  • Glenn Lockwood is a member of NERSC's Advanced Technologies Group at Lawrence Berkeley National Laboratory who specializes in I/O performance analysis, extreme-scale storage architectures, and emerging I/O technologies and APIs. His work is centered around understanding I/O performance by correlating performance analysis across all levels of the I/O subsystem, from node-local page cache to back-end storage devices.
  • Shane Snyder is also a software developer in MCS. His research interests include parallel file systems, I/O middleware systems, and HPC I/O workload characterization.
  • Jakob Luettgau is a researcher with a focus on I/O and parallel computing at the DKRZ. He is employed to work on the ESiWACE project to help on the convergence of climate and weather related applications and workflows. Among his interest is the better understanding of data placement of structured scientific data in the context of HDF5, NetCDf and GRIB.
  • Salem El Sayed graduated from the Stuttgart University in 2011 with a Masters in information technology. Since then he has been working on I/O related issues. He began in IBM with analysing the introduction of new storage technologies and their effect on operating system and HPC node performance. As part of the Juelich Supercomputing centre (JSC), he has been working long term on characterizing HPC and scientific applications' I/O behaviour and it's effect on designing future I/O architectures such as active storage. He currently continues this research by working as part of the Percipient StorAGE for Exascale Data Centric Computing (SAGE) project, which tasks itself with building a data centric infrastructure for handling extreme data in the Exascale/Exabyte era.
  • Xiaosong Ma is currently a Senior Scientist at Qatar Computing Research Institute. Previously, she was associate professor at NC State, as well as a joint faculty member at ORNL. She has carried out research in several HPC I/O areas, including parallel I/O libraries, caching/prefetching, and I/O workload characterization.