BoF: Analyzing Parallel I/O
Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These challenges call for sophisticated tools to capture, analyze, understand, and tune application I/O.
In this BoF, we will highlight recent advances in monitoring tools to help address this problem. We will also encourage community discussion to compare best practices, identify gaps in measurement and analysis, and find ways to translate parallel I/O analysis into actionable outcomes for users, facility operators, and researchers.
|Date||Thursday Nov. 16th, 2017|
|Venue||Room 402-403-404, Denver, USA|
The BoF is organized by
We have a series of 8+2 (min) talks followed by a longer discussion:
- Introduction – Phil Carns – Slides
- Towards Total Knowledge of I/O at NERSC through Holistic Monitoring – Glenn Lockwood – Slides
- Analyzing Lustre File System Performance With Splunk – Ross Miller – Slides
- Real-Time I/O Monitoring of HPC Applications – Eugen Betke – Slides
- Job-based I/O-monitoring with LLview – Wolfgang Frings – Slides
- Panel (15 minutes) – Moderated by Julian Kunkel
- Glenn Lockwood is a member of NERSC's Advanced Technologies Group at Lawrence Berkeley National Laboratory who specializes in I/O performance analysis, extreme-scale storage architectures, and emerging I/O technologies and APIs. His work is centered around understanding I/O performance by correlating performance analysis across all levels of the I/O subsystem, from node-local page cache to back-end storage devices.
- Ross Miller is a software developer in the Technology Integration group at Oak Ridge National Lab where he works on a variety of projects including file system monitoring tools.
- Eugen Betke has completed his study of computer science in 2015 with specialization on machine learning and I/O performance. In his master thesis he applied machine learning methods to predict I/O performance. At the beginning of 2016 he started as a researcher at the German Climate Computing Center. His key areas are analysis and optimization of HPC-I/O; he, for example, developed a cluster wide monitoring system for Lustre on Mistral. During his scientific career, he has acquired about three years of experience in these areas.
- Wolfgang Frings is member of the Application Support Division at the Jülich Supercomputing Centre. His research interest focuses on parallel I/O, I/O middleware, and HPC system monitoring. He is author of several software tools used at many HPC centers. Among these are SIONlib, a library to support task-local parallel file I/O on large-scale systems and LLview, a batch system monitoring software.