research:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
research:start [2018-06-23 12:02]
Michael Kuhn
research:start [2019-07-10 15:09] (current)
Line 3: Line 3:
 **The group Scientific Computing conducts research and development on high performance storage systems. We develop HPC concepts and apply them to simulation software with a focus on earth system models.** **The group Scientific Computing conducts research and development on high performance storage systems. We develop HPC concepts and apply them to simulation software with a focus on earth system models.**
  
-More information about specific projects and our publications can be found in the navigation bar on the left.+More information about specific ​[[.:projects:​|projects]] ​and our [[publications|publications]] can be found on their respective pages.
  
-===== High Performance Input/​Output ​ =====+===== High Performance ​Computing and Input/​Output ​ =====
  
 In high performance computing it is important to consider I/O capacity and bandwidth. A multitude of cluster file systems exist, each with different requirements,​ interfaces and behaviors. Benchmarks are used to evaluate their performance characteristics for specific use cases. However, due to the fact that the performance of file systems usually depends on the used access patterns it is difficult to compare them with each other. While storing large amounts of data is usually unproblematic,​ storing a large number of files poses another challenge, because of the associated management overhead. Some applications produce billions of files, pushing file systems to their limits. One important factor are file system semantics which can affect the overall performance heavily. The group'​s focus lies on evaluating their effects and proposing new strategies with regards to these semantics. In high performance computing it is important to consider I/O capacity and bandwidth. A multitude of cluster file systems exist, each with different requirements,​ interfaces and behaviors. Benchmarks are used to evaluate their performance characteristics for specific use cases. However, due to the fact that the performance of file systems usually depends on the used access patterns it is difficult to compare them with each other. While storing large amounts of data is usually unproblematic,​ storing a large number of files poses another challenge, because of the associated management overhead. Some applications produce billions of files, pushing file systems to their limits. One important factor are file system semantics which can affect the overall performance heavily. The group'​s focus lies on evaluating their effects and proposing new strategies with regards to these semantics.
  
-Within the Marie Curie Initial Training Network “SCALing by means of Ubiquitous Storage” (SCALUS) our focus is the problem of deduplicationDeduplication provides mechanisms that save storage space by storing blocks with the same content only onceThese mechanisms permit ​to reduce the needed ​storage ​space and to increase ​the available bandwidthbecause less data needs to be transferred between ​the source ​and the targetNeverthelessdeduplication is expensive in terms of calculation,​ which is why we want to benefit from hardware accelerators (FPGAs ​and/or GPUs) in order to reduce ​the time spent for calculation. The vision of SCALUS ​is to deliver ​the foundation for ubiquitous storage systemswhich can be scaled in arbitrary directions (capacity, performance,​ distance and security).+Universität Hamburg has become one of five Intel Parallel Computing Centers for Lustre worldwideThe project [[.:​projects:​ipcc-l:​|Enhanced Adaptive Compression in Lustre]] aims to enable compression within ​the Lustre filesystemSince computational power continues ​to improve at a faster pace than storage ​capacity ​and throughput, reducing ​the amount of data is an important feature. At firstthe infrastructure will be prepared to pass through ​the compressed data and make the backend (ZFS) handle them correctlyThis already involves client- as well as server-side changes. Each stripe will be chunkedcompressed ​and sent over the network. Preliminary user space analysis has shown that read-ahead can become a big problem when the chunks are read with logical gaps. The next technical challenge ​is to integrate the changes into ZFS. Once the infrastructure is donethe actual topic of adaptivity and dynamic decision making will be investigated
  
-Climate simulations tend to produce huge amounts of data that needs to be held available ​for further research, ​data that needs to be handled efficiently both in terms of performance ​and storage space. As part of the ICOMEX project funded by the DFGwe are researching the typical temporal ​and spatial I/O access patterns ​to determine what can be done to substantially improve ​the performance ​of climate ​data storageAs the climate ​data is also prone to include substantial amounts ​of redundancy, we are also researching ways to compress climate model data lossless beyond ​the compression ratios achieved by the current standard algorithms, which are already employed on a regular basis. It also remains ​to be seen whether such a compression scheme shows enough performance as a software implementation or if hardware acceleration using FPGAs is of advantage.+[[.:​projects:​bigstorage:​|BigStorage]] was a European Training Network (ETN) whose main goal is to train future ​data scientists in order to enable them to apply holistic and interdisciplinary approaches ​for taking advantage of a data-overwhelmed world. This requires HPC and cloud infrastructures with redefinition ​of storage architectures underpinning themwhile focusing on meeting highly ambitious performance ​and energy usage objectives. According ​to the main objectives ​of BigStorage, power-saving and energy-efficient ​data reduction solutions and approaches for measuring and modeling power consumption were examinedWork on a framework for energy-efficient compression of scientific ​data is still ongoing even after the end of the project. It makes use of machine learning ​to find optimal data reduction strategies.
  
 **Contact**:​ [[:​people:​Michael Kuhn]] **Contact**:​ [[:​people:​Michael Kuhn]]
Line 17: Line 17:
 ===== Earth System Modelling ===== ===== Earth System Modelling =====
  
-For the use of HPC environmental modelling plays an important role. Climate models are well known as typical users of HPC infrastructure. Nevertheless,​ a number of other environmental modelling aspects are also reliable on the access to both, high computational power and large storage facilities for the simulation results. At our group models representing the ecosystem of the North Sea are in the focus of environmental modelling activities. For example, based on the hydrodynamical model HAMSOM (Hamburg Shelf Ocean Model) the effects of offshore wind farm installations are looked at. The changes in the marine environment is analyzed in relation to the wake effect, which results from the rotation of the propellers. The calculation of the nearly one million wet grid points from the North Sea topography needs up to date computational power and the possibility to store large volumes of simulation results.+For the use of HPC environmental modelling plays an important role. Climate models are well known as typical users of HPC infrastructure. Nevertheless,​ a number of other environmental modelling aspects are also reliable on the access to both, high computational power and large storage facilities for the simulation results. At our group models representing the ecosystem of the North Sea are in the focus of environmental modelling activities.
  
-{{:research:applicationwakeeffectrealistic.png?​500|Application of the wake effect ​for the upper layer of the North Sea on the 20th June 2003Realistic wind conditions (m/s) including ​the wind reduction downstream ​of the offshore wind farm (indicated ​by the rectangle in the center).}} +The project [[.:projects:​i_sss:|integrated Support System ​for Sustainability]] started at the beginning ​of 2016 and has a project time of 5 yearsThe aim of the project is to enable farmers to determine site characteristics ​of their field in order to apply measures for a resource and environment-friendly agriculture. The information for the farmers will be provided ​by the geographic information system SAGA (System for Automated Geoscientific Analyses). Therefore, the SAGA tool is the central platform for development which also includes ​the incorporation ​of model information, e.g. on hydrological conditions as well as remote sensing data. To fulfil ​the targets ​of i_SSSaccess to weather data is necessary to evaluate and run models for terrain analysis. For this purpose we gained access on historical forecast data of the last two years. We developed a tool to handle weather data from the Deutscher Wetterdienst (but also from other sources like Global Forecast System and RADOLAN) and to preprocess weather data before they are loaded into SAGADifferent approaches ​and tools for loading ​and pre-processing input data were evaluated to select the one with the most promising prospect
- +
-Another example of our work is the EU-Project Coastal Biomass Observatory Services (CoBiOS): CoBiOS aims towards ​the integration ​of satellite products and ecological models into a user-relevant ​information ​service to predict ​the development ​of high biomass algal blooms in North Europe’s coastal waters. These blooms can be potentially harmful becausewhen they decay, they can consume most of the oxygen present in the water causing dead zonesThe project needs ready at hand HPC resources ​and a fit for purpose scheduled interaction between satellite derived data products ​and ecosystem models including large data storage capacities.+
  
 **Contact**:​ [[:​people:​Hermann Lenhart]] **Contact**:​ [[:​people:​Hermann Lenhart]]
research/start.1529748156.txt.gz · Last modified: 2018-06-23 12:02 by Michael Kuhn