CluStor: Workshop on Cluster Storage Technology

Storage systems have an increasing importance for high performance computing. Real applications from e.g. physics, bioinformatics and climate research produce huge volumes of data that have to be managed efficiently. The workshop concentrates on storage technologies where we follow the principle of clustering of resources. This already proved viable for providing CPU cycles and it is now extended to storage. However, the research field is still young and there are many open issues. The goal of the workshop is to bring together researchers in that field and to provide a platform for discussions and maybe common future cooperations and projects.

The workshop will take place at DKRZ in Hamburg, Germany. For accomodation, we recommend the Mercure Hotel Hamburg Mitte, Schröderstiftstrasse 3, 20146 Hamburg

Important Dates

Participation Feedback	June 5, 2015
Abstract Submission	July 16, 2015
CluStor Workshop	July 30–31, 2015

Guidelines

The workshop will be run according to the following guidelines:

We want to keep the workshop small in order to have lively discussions. For the moment participation is by invitation only. We will have at most 11 talks and the total number of participants is limited to 30.
We encourage to have talks that present current and future work and would like to have open discussions. Thus, we will not make slides publicly available. Access to any material will be to the participants of the workshop only.
We concentrate on discussions. No papers will be published, just slides given to all participants.

If you are interested in presenting your work, please send a one page abstract of your talk or the research you want to present to the organizers.

Organization

The workshop is organized by

Michael Kuhn (University of Hamburg, Germany)
Julian Kunkel (DKRZ, Germany)

Topics of Interest

We are interested in talks from the following and other related areas:

File system internals, interfaces and semantics
Performance analysis and visualization tools
Benchmarking of all aspects of the file systems
Performance modelling
Access optimizations
Metadata management

Agenda

Talks will be 30 minutes plus 15 minutes of discussion.

The preliminary structure of the workshop is as follows:

Thursday, July 30

10:00	Welcome Coffee
10:15	Welcome – Julian Kunkel, Michael Kuhn
10:30	Persistent data as a first-class citizen in parallel programming models – Anna Queralt (BSC, Spain)
11:15	Deeper and active I/O hierarchies – Dirk Pleiter (FZJ, Germany)
12:00	Massively parallel task-local I/O with SIONlib – Wolfgang Frings (FZJ, Germany)
12:45	Lunch
13:30	Tucana: User-space I/O for us-level storage devices – Anastasios Papagiannis (FORTH, Greece)
14:15	Exploiting Semantical Information for Performance Optimization and Data Reduction – Michael Kuhn (UHAM, Germany)
15:00	Coffee Break
15:30	SIOX Replay - Laboratory for I/O Research – Jakob Lüttgau (UHAM, Germany)
15:50	Mistral Supercomputer I/O – Julian Kunkel (DKRZ, Germany)
16:15	Computer Room Tour
16:45	Analyzing and Assessing I/O Performance – Julian Kunkel (DKRZ, Germany)
17:30	CLARISSE: A run-time middleware for coordinating data staging on large scale supercomputers – Florin Isaila (ANL, USA & UC3M, Spain)
18:15	Adaptive Compression for the Zettabyte File System – Florian Ehmke
19:00	Dinner (sponsored by DKRZ)

Friday, July 31

09:30	Learning Access Patterns of HPC I/O – Michaela Zimmer (UHAM, Germany)
09:50	Towards Automatic Learning of I/O Characteristics – Eugen Betke (UHAM, Germany)
10:10	Coffee Break
10:20	Optimizing time and energy resource usage for climate model runs – Marc Wiedemann (UHAM, Germany)
10:50	Quality of Service and Data-Lifecycle for Big Data – Paul Millar (DESY, Germany)
11:10	File-driven Input in ICON – Nathanel Hübbe (DWD, Germany)
11:40	Farewell Snack and Discussion

Abstracts

Persistent data as a first-class citizen in parallel programming models (Anna Queralt)

dataClay is a storage platform that manages data in the form of objects. It enables the applications on top to deal with distributed persistent objects transparently, in the same way as if they were in memory. In addition, dataClay takes to the limit the concept of moving computation to the data, by never separating the data from the methods that allow to manipulate it.

By always keeping data and code together, dataClay makes it easier for programming models such as COMPSs to take advantage of data locality, for instance by means of locality-aware iterators that help to exploit parallelism. The combination of these two technologies provides a powerful solution to access and compute on huge datasets, allowing applications to easily handle objects that are too big to fit in memory or that are distributed among several nodes.

In this talk we will address how persistent data can be integrated in the programming model by presenting the integration of dataClay and COMPSs, both from the point of view of an application that manages objects and from the point of view of the runtime.

Tucana: User-space I/O for μs-level storage devices (Anastasios Papagiannis)

System software overheads in the I/O path, including VFS and file system code, become more pronounced with emerging low-latency storage devices. Currently, these overheads constitute the main bottleneck in the I/O path and they limit efficiency of modern storage systems. In this paper we present Tucana, a new I/O path for applications, that minimizes overheads from system software in the common I/O path. The main idea is the separation of the control and data planes. The control plane consists of an unmodified Linux kernel and is responsible for handling data plane initialization and the normal processing path through the kernel for non-file related operations. The data plane is a lightweight mechanism to provide direct access to storage devices with minimum overheads and without sacrificing strong protection semantics. Tucana requires neither hardware support from the storage devices nor changes in the user applications. We evaluate our early prototype and we find that it achieves on a single core up to 1.7× and 2.2× better read and write random IOPS respectively compared to the xfs and ext4 file systems.

Quality of Service and Data-Lifecycle for Big Data (Paul Millar)

Science projects that produce large amounts of data have commensurate challenges. One such challenge is that the data must be shepherded for the duration of the project: controlling the quality-of-service (to minimise costs), controlling who has access, and controlling when data is deleted. The importance of such activity is underlined by the EU requiring all applicants to Horizon-2020 funding have a Data Management Plan (DMP).

Despite similarities between projects, there exists no common infrastructure to allow projects to offload the management effort described in their DMP. Within the INDIGO-DataCloud project, we will address this by building a common vocabulary for describing aspects of a DMP, an standards-backed network interface for controlling storage systems, and multiple implementations that provide this interface.

The dCache software already provides this functionality, albeit in a limited form. Users may write data with different quality-of-service; they may also request data be made available with better quality-of-service. Under the auspices of INDIGO-DataCloud, the dCache team will enhance this existing functionality to provide the level of control needed to support many elements common to DMPs.

In this presentation, we describe the plans for INDIGO-DataCloud and how you can contribute to this process, along with the plan for support within dCache.

Deeper and active I/O hierarchies (Dirk Pleiter)

Non-volatile memory will have a major impact on future supercomputing architectures. In this talk we will discuss architectural design options which are enabled through these new technologies. A particular focus will be the enablement of active storage architectures like Blue Gene Active Storage. For Blue Gene Active Storage is realised by integrating non-volatile memory into the Blue Gene/Q I/O nodes, thus creating an intermediate storage layer. We will present details on this new architecture including novel developments on software interfaces for accessing the non-volatile memory. After a report on micro-benchmark results demonstrating the capabilities of this architecture, we will present in detail a number of use cases analyse the opportunities for scientific applications as well as the challenges to exploit them. During this talk we will also highlight a number of related architectural and technological developments.

CLARISSE: A run-time middleware for coordinating data staging on large scale supercomputers (Florin Isaila)

Currently, the I/O software stack of high-performance computing platforms consists of independently developed layers (scientific libraries, middlewares, I/O forwarding, parallel file systems), lacking global coordination mechanisms. This uncoordinated development model negatively impacts the performance of both independent and ensembles of applications relying on the I/O stack for data access. This talk will present the CLARISSE project approach of redesigning the I/O stack aiming to facilitate global optimizations, programmability, and extendability. CLARISSE is a coordination framework with separated control and data plane. The talk will overview CLARISSE data and control plane design and implementation and it will discuss how CLARISSE can be used to support the global improvement of key aspects of data staging including load-balance, I/O scheduling, and resilience.

Massively parallel task-local I/O with SIONlib (Wolfgang Frings)

Parallel applications often store data in one or multiple task-local files. Examples are the creation of checkpoints, circumvention of memory limitations, or recording performance data. When operating at very large processor configurations, such applications often experience scalability limitations. This mainly happens when the simultaneous creation of thousands of files causes meta-data server contention or simply when large file counts create a high load for file management or operations on those files even destabilize the file system. Furthermore, using POSIX-based shared-file I/O to circumvent these limitation leads to bottlenecks in meta-data handling at very large processor counts. In this talk we will cover the design principles of the parallel I/O library SIONlib, which addresses the meta-data bottleneck on directory level by transparently mapping a large number of task-local files onto shared files and the bottleneck on single file level by optimizing the file meta-data handling with an internal mapping of data chunks to multiple physical files. This way SIONlib ensures high I/O performance at large processor counts.

Scientific Computing // Wissenschaftliches Rechnen

Table of Contents