Table of Contents
The group Scientific Computing conducts research on high performance storage systems, energy efficiency, and simulation of cluster infrastructure. We have expertise in parallel programming and environmental modeling.
More information about specific projects and our publications can be found in the navigation bar on the left.
High Performance Input/Output
In high performance computing it is important to consider I/O capacity and bandwidth. A multitude of cluster file systems exist, each with different requirements, interfaces and behaviors. Benchmarks are used to evaluate their performance characteristics for specific use cases. However, due to the fact that the performance of file systems usually depends on the used access patterns it is difficult to compare them with each other. While storing large amounts of data is usually unproblematic, storing a large number of files poses another challenge, because of the associated management overhead. Some applications produce billions of files, pushing file systems to their limits. One important factor are file system semantics which can affect the overall performance heavily. The group's focus lies on evaluating their effects and proposing new strategies with regards to these semantics.
Within the Marie Curie Initial Training Network “SCALing by means of Ubiquitous Storage” (SCALUS) our focus is the problem of deduplication. Deduplication provides mechanisms that save storage space by storing blocks with the same content only once. These mechanisms permit to reduce the needed storage space and to increase the available bandwidth, because less data needs to be transferred between the source and the target. Nevertheless, deduplication is expensive in terms of calculation, which is why we want to benefit from hardware accelerators (FPGAs and/or GPUs) in order to reduce the time spent for calculation. The vision of SCALUS is to deliver the foundation for ubiquitous storage systems, which can be scaled in arbitrary directions (capacity, performance, distance and security).
Climate simulations tend to produce huge amounts of data that needs to be held available for further research, data that needs to be handled efficiently both in terms of performance and storage space. As a part of the ICOMEX project funded by the DFG, we are researching the typical temporal and spatial I/O access patterns to determine what can be done to substantially improve the performance of climate data storage. As the climate data is also prone to include substantial amounts of redundancy, we are also researching ways to compress climate model data lossless beyond the compression ratios achieved by the current standard algorithms, which are already employed on a regular basis. It also remains to be seen whether such a compression scheme shows enough performance as a software implementation or if hardware acceleration using FPGAs is of advantage.
Contact: Dr. Michael Kuhn
Simulation of Distributed Systems
Understanding supercomputers architecture is important to assess observed application performance and to identify bottlenecks. Simulating application behavior on various systems with individual components enables to project performance on future systems. Furthermore, the impact of replacing parts of the system like I/O-subsystems with faster components can be analyzed in simulation. Simulation also allows to evaluate the impact of MPI-internal implementations without coding them into existing MPI libraries.
- Execution of MPI applications is recorded in trace files.
- A cluster model describes the characteristics of the hardware environment and the interconnect topology. Application traces are mapped to the cluster model.
- A discrete-event simulation is performed. Internal events like server I/O operations and network activity is recorded as well.
- Results can be inspected visually in the same viewer as the original trace files.
Contact: Dr. Julian Kunkel
Energy Efficiency of High Performance Computing Installations
During the last years we see a strong increase in energy consumption and related costs in high performance computing. These costs are already in the range of acquisition costs of the whole computer. Our project – called Energy-Efficient Cluster Computing – aims at making high performance computing more efficient with respect to economic and ecological aspects. The basic idea is to determine relationships between the behavior of parallel programs and their impact on the energy consumption of the underlying compute cluster. Further, strategies will be developed to reduce the energy consumption with as little impact as possible on program performance.
Contact: Timo Minartz
Earth System Modelling
For the use of HPC environmental modelling plays an important role. Climate models are well known as typical users of HPC infrastructure. Nevertheless, a number of other environmental modelling aspects are also reliable on the access to both, high computational power and large storage facilities for the simulation results. At our group models representing the ecosystem of the North Sea are in the focus of environmental modelling activities. For example, based on the hydrodynamical model HAMSOM (Hamburg Shelf Ocean Model) the effects of offshore wind farm installations are looked at. The changes in the marine environment is analyzed in relation to the wake effect, which results from the rotation of the propellers. The calculation of the nearly one million wet grid points from the North Sea topography needs up to date computational power and the possibility to store large volumes of simulation results.
Another example of our work is the EU-Project Coastal Biomass Observatory Services (CoBiOS): CoBiOS aims towards the integration of satellite products and ecological models into a user-relevant information service to predict the development of high biomass algal blooms in North Europe’s coastal waters. These blooms can be potentially harmful because, when they decay, they can consume most of the oxygen present in the water causing dead zones. The project needs ready at hand HPC resources and a fit for purpose scheduled interaction between satellite derived data products and ecosystem models including large data storage capacities.
Contact: Dr. Hermann Lenhart