Research
The group Scientific Computing conducts research and development on high performance storage systems. We develop HPC concepts and apply them to simulation software with a focus on earth system models.
More information about specific projects and our publications can be found on their respective pages.
High Performance Computing and Input/Output
In high performance computing it is important to consider I/O capacity and bandwidth. A multitude of cluster file systems exist, each with different requirements, interfaces and behaviors. Benchmarks are used to evaluate their performance characteristics for specific use cases. However, due to the fact that the performance of file systems usually depends on the used access patterns it is difficult to compare them with each other. While storing large amounts of data is usually unproblematic, storing a large number of files poses another challenge, because of the associated management overhead. Some applications produce billions of files, pushing file systems to their limits. One important factor are file system semantics which can affect the overall performance heavily. The group's focus lies on evaluating their effects and proposing new strategies with regards to these semantics.
Universität Hamburg has become one of five Intel Parallel Computing Centers for Lustre worldwide. The project Enhanced Adaptive Compression in Lustre aims to enable compression within the Lustre filesystem. Since computational power continues to improve at a faster pace than storage capacity and throughput, reducing the amount of data is an important feature. At first, the infrastructure will be prepared to pass through the compressed data and make the backend (ZFS) handle them correctly. This already involves client- as well as server-side changes. Each stripe will be chunked, compressed and sent over the network. Preliminary user space analysis has shown that read-ahead can become a big problem when the chunks are read with logical gaps. The next technical challenge is to integrate the changes into ZFS. Once the infrastructure is done, the actual topic of adaptivity and dynamic decision making will be investigated.
BigStorage was a European Training Network (ETN) whose main goal is to train future data scientists in order to enable them to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world. This requires HPC and cloud infrastructures with a redefinition of storage architectures underpinning them, while focusing on meeting highly ambitious performance and energy usage objectives. According to the main objectives of BigStorage, power-saving and energy-efficient data reduction solutions and approaches for measuring and modeling power consumption were examined. Work on a framework for energy-efficient compression of scientific data is still ongoing even after the end of the project. It makes use of machine learning to find optimal data reduction strategies.
Contact: Prof. Dr. Michael Kuhn
Earth System Modelling
For the use of HPC environmental modelling plays an important role. Climate models are well known as typical users of HPC infrastructure. Nevertheless, a number of other environmental modelling aspects are also reliable on the access to both, high computational power and large storage facilities for the simulation results. At our group models representing the ecosystem of the North Sea are in the focus of environmental modelling activities.
The project integrated Support System for Sustainability started at the beginning of 2016 and has a project time of 5 years. The aim of the project is to enable farmers to determine site characteristics of their field in order to apply measures for a resource and environment-friendly agriculture. The information for the farmers will be provided by the geographic information system SAGA (System for Automated Geoscientific Analyses). Therefore, the SAGA tool is the central platform for development which also includes the incorporation of model information, e.g. on hydrological conditions as well as remote sensing data. To fulfil the targets of i_SSS, access to weather data is necessary to evaluate and run models for terrain analysis. For this purpose we gained access on historical forecast data of the last two years. We developed a tool to handle weather data from the Deutscher Wetterdienst (but also from other sources like Global Forecast System and RADOLAN) and to preprocess weather data before they are loaded into SAGA. Different approaches and tools for loading and pre-processing input data were evaluated to select the one with the most promising prospect.
Contact: Dr. Hermann Lenhart