research:publications

Publications

Show reviewed papersHide abstractsSort by type

2018

  • Assessing the Scales in Numerical Weather and Climate Predictions: Will Exascale be the Rescue? (Philipp Neumann, Peter Düben, Panagiotis Adamidis, Peter Bauer, Matthias Brueck, Luis Kornblueh, Daniel Klocke, Bjorn Stevens, Nils Wedi, Joachim Biercamp), In Philosophical Transactions of the Royal Society A, The Royal Society Publishing, 2018-12-12 (to be published)
    BibTeX DOI
  • TweTriS: Twenty Trillion-atom Simulation (Nikola Tchipev, Steffen Seckler, Matthias Heinen, Jadran Vrabec, Fabio Gratl, Martin Horsch, Martin Bernreuther, Colin W. Glass, Christoph Niethammer, Nicolay Hammer, Bernd Krischok, Michael Resch, Dieter Kranzlmüller, Hans Hasse, Hans-Joachim Bungartz, Philipp Neumann), In International Journal of High Performance Computing Applications, SAGE, 2018-11-14 (to be published)
    BibTeX
  • Mastering the scales: a survey on the benefits of multiscale computing software (Derek Groen, Jaroslaw Knap, Philipp Neumann, Diana Suleimenova, Lourens Veen, Kenneth Leiter), In Philosophical Transactions of the Royal Society A, The Royal Society Publishing, 2018-11-06 (to be published)
    BibTeX
  • Poster: ESiWACE: Performance Predictions for Storm-Resolving Simulations of the Climate System (Philipp Neumann, Joachim Biercamp), Basel, Switzerland, PASC, 2018-07-03
    BibTeX
    Abstract: With exascale computing becoming available in the next decade, global weather prediction at the kilometer scale will be enabled. Moreover, the climate community has already begun to contemplate a new generation of high-resolution climate models. High-resolution model development is confronted with several challenges. Scalability of the models needs to be optimal, including all relevant components such as I/O which easily becomes a bottleneck; both runtime and I/O will dictate how fine a resolution can be chosen while still being able to run the model at production level, e.g. at 1-30 years/day depending on the questions to be addressed. Moreover, given various scalability experiments from prototypical runs and additional model data, estimating performance for new simulations can become challenging. I present results achieved in the scope of the Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) using the ICON model for global high-resolution simulations. I give an overview of the project, I show results from multi-week global 5km simulations, and I discuss current features and limits of the simulations. I further link the findings to a new intercomparison initiative DYAMOND for high-resolution predictions. Finally, I discuss performance prediction approaches for existing performance data.
  • Mission possible: Unify HPC and Big Data stacks towards application-defined blobs at the storage layer (Pierre Matri, Yevhen Alforov, Álvaro Brandon, María S. Pérez, Alexandru Costan, Gabriel Antoniu, Michael Kuhn, Philip Carns, Thomas Ludwig), In Future Generation Computer Systems, (Editors: Peter Sloot), Elsevier, 2018-07
    BibTeX URL DOI
    Abstract: HPC and Big Data stacks are completely separated today. The storage layer offers opportunities for convergence, as the challenges associated with HPC and Big Data storage are similar: trading versatility for performance. This motivates a global move towards dropping file-based, POSIX-IO compliance systems. However, on HPC platforms this is made difficult by the centralized storage architecture using file-based storage. In this paper we advocate that the growing trend of equipping HPC compute nodes with local storage redistributes the cards by enabling object storage to be deployed alongside the application on the compute nodes. Such integration of application and storage not only allows fine-grained configuration of the storage system, but also improves application portability across platforms. In addition, the single-user nature of such application-specific storage obviates the need for resource-consuming storage features like permissions or file hierarchies offered by traditional file systems. In this article we propose and evaluate Blobs (Binary Large Objects) as an alternative to distributed file systems. We factually demonstrate that it offers drop-in compatibility with a variety of existing applications while improving storage throughput by up to 28%.
  • Poster: ESiWACE: Centre of Excellence in Simulation of Weather and Climate in Europe (Philipp Neumann), Frankfurt, Germany, ISC Project Poster Session, 2018-06-26
    BibTeX
    Abstract: The Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) fosters the integration of the weather and climate communities by leveraging two established European networks: the European Network for Earth System Modelling (ENES) representing the European Climate modelling community, and the European Centre for Medium-Range Weather Forecasts (ECMWF). A main goal of ESiWACE is to substantially improve efficiency and productivity of numerical weather and climate simulation on high-performance computing platforms by supporting the end-to-end workflow of global Earth system modelling. In particular, weather and climate models are prepared for the exascale era; in this scope, ESiWACE establishes demonstrator simulations, which run at highest affordable resolutions (target 1km) on current PetaFLOP supercomputers. This will yield insights into the computability of configurations that will be sufficient to address key scientific challenges in weather and climate prediction at exascale, such as reducing uncertainties in climate model parametrization or the prediction of extreme climate events. The poster introduces the ESiWACE objectives and gives an overview of ESiWACE developments, including scalability and performance results achieved for the high-resolution demonstrators. It further gives an outlook on expected impacts of the developments and lists upcoming project activities.
  • Poster: TaLPas: Task-Based Load Balancing and Auto-Tuning in Particle Simulations (Philipp Neumann), Frankfurt, Germany, ISC Project Poster Session, 2018-06-26
    BibTeX
    Abstract: TaLPas will provide a solution to fast and robust simulation of many, inter-dependent particle systems in peta- and exascale supercomputing environments. This will be beneficial for a wide range of applications, including sampling in molecular dynamics (rare event sampling, determination of equations of state, etc.), uncertainty quantification (sensitivity investigation of parameters on actual simulation results), or parameter identification (identification of optimal parameter sets to fit numerical model and experiment). For this purpose, TaLPas targets 1. the development of innovative auto-tuning based particle simulation software in form of an open-source library to leverage optimal node-level performance. This will guarantee an optimal time-to-solution for small- to mid-sized particle simulations, 2. the development of a scalable task scheduler to optimally distribute inter-dependent particle simulation tasks on available HPC compute resources, 3. the investigation of performance prediction methods for particle simulations to support auto-tuning and to feed the scheduler with accurate runtime predictions, 4. the integration of auto-tuning based particle simulation, scalable task scheduler and performance prediction, augmented by visualisation of the sampling (parameter space exploration) and an approach to resilience. The latter will guarantee robustness at peta- and exascale. Work presented at ISC will focus on steps 1-3. The integration of all components (step 4) is anticipated for the year 2019. To reach its goals, TaLPas bundles interdisciplinary expert knowledge on high-performance computing, visualisation and resilience, performance modeling, and particle applications.
  • Poster: Advanced Computation and I/O Methods for Earth-System Simulations (AIMES) (Julian Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther Zängl, Hisashi Yashiro, Ryuji Yoshida, Hirofumi Tomita, Masaki Satoh, Yann Meurdesoif, Nabeeh Jumah, Anastasiia Novikova, Anja Gerbes), Germany, Frankfurt, ISC 2018, 2018-06-26
    BibTeX URL
    Abstract: The Advanced Computation and I/O Methods for Earth-System Simulations (AIMES) project addresses the key issues of programmability, computational efficiency and I/O limitations that are common in next-generation icosahedral earth-system models. Ultimately, the project is intended to foster development of best-practices and useful norms by cooperating on shared ideas and components. During the project, we will ensure that the developed concepts and tools are not only applicable for earth-science but for other scientific domains as well. In this poster we show the projects plan and progress during the first two years of the project lifecycle.
  • Poster: A user-controlled GGDML Code Translation Technique for Performance Portability of Earth System Models (Nabeeh Jumah, Julian Kunkel), Germany, Frankfurt, ISC 2018, 2018-06-26
    BibTeX URL
    Abstract: Demand for high-performance computing is increasing in earth system modeling, and in natural sciences in general. Unfortunately, automatic optimizations done by compilers are not enough to make use of target machines' capabilities. Manual code adjustments are mandatory to exploit hardware capabilities. However, optimizing for one architecture, may degrade performance for other architectures. This loss of portability is a challenge. Our approach involves the use of the GGDML language extensions to write a higher-level modeling code, and use a user-controlled source-to-source translation technique. Translating the code results in an optimized version for the target machine.
    The contributions of this poster are: * The use of a highly-configurable code translation technique to transform higher-level code into target-machine-optimized code. * Evaluation of code transformation for multi-core and GPU based machines, both single and multi-node configurations.
  • Poster: Automatic Profiling for Climate Modeling (Anja Gerbes, Nabeeh Jumah, Julian Kunkel), Bristol, United Kingdom, Euro LLVM, 2018-04-17
    BibTeX URL Publication
    Abstract: Some applications are time consuming like climate modeling, which include lengthy simulations. Hence, coding is sensitive for performance. Spending more time on optimization of specific code parts can improve total performance. Profiling an application is a well-known technique to do that. Many tools are available for developers to get performance information about their code. With our provided python package Performance Analysis and Source-Code Instrumentation Toolsuite (PASCIT) is a automatic instrumentation of an user’s source code possible. Developers mark the parts that they need performance information about. We present an effort to profile climate modeling codes with two alternative methods: • usage of GGDML translation tool to mark directly the computational kernels of an application for profiling. • usage of GGDML translation tool to generate a serial code in a first step and then use LLVM/Clang to instrument some code parts with a profiler’s directives. The resulting codes are profiled with the LIKWID profiler. Alternatively, we use perf and OProfile’s ocount & operf to measure hardware characteristics. The performance report with a visualization of the measured hardware performance counters in generating Radar Charts, Latex Tables, Box Plots are interesting for scientist to understand the bottlenecks of their codes.
  • A Survey of Storage Systems for High-Performance Computing (Jakob Lüttgau, Michael Kuhn, Kira Duwe, Yevhen Alforov, Eugen Betke, Julian Kunkel, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 5, Number 1, pp. 31–58, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2018-04
    BibTeX URL DOI
    Abstract: In current supercomputers, storage is typically provided by parallel distributed file systems for hot data and tape archives for cold data. These file systems are often compatible with local file systems due to their use of the POSIX interface and semantics, which eases development and debugging because applications can easily run both on workstations and supercomputers. There is a wide variety of file systems to choose from, each tuned for different use cases and implementing different optimizations. However, the overall application performance is often held back by I/O bottlenecks due to insufficient performance of file systems or I/O libraries for highly parallel workloads. Performance problems are dealt with using novel storage hardware technologies as well as alternative I/O semantics and interfaces. These approaches have to be integrated into the storage stack seamlessly to make them convenient to use. Upcoming storage systems abandon the traditional POSIX interface and semantics in favor of alternative concepts such as object and key-value storage; moreover, they heavily rely on technologies such as NVM and burst buffers to improve performance. Additional tiers of storage hardware will increase the importance of hierarchical storage management. Many of these changes will be disruptive and require application developers to rethink their approaches to data management and I/O. A thorough understanding of today's storage infrastructures, including their strengths and weaknesses, is crucially important for designing and implementing scalable storage systems suitable for demands of exascale computing.
  • SkaSim - Skalierbare HPC-Software für molekulare Simulationen in der chemischen Industrie (Jadran Vrabec, Martin Bernreuther, Hans-Joachim Bungartz, Wei-Lin Chen, Wilfried Cordes, Robin Fingerhut, Colin W. Glass, Jürgen Gmehling, René Hamburger, Manfred Heilig, Mattias Heinen, Martin T. Horsch, Chieh-Ming Hsieh, Marco Hülsmann, Philip Jäger, Peter Klein, Sandra Knauer, Thorsten Köddermann, Andreas Köster, Kai Langenbach, Shiang-Tai Lin, Philipp Neumann, Jürgen Rarey, Dirk Reith, Gábor Rutkai, Michael Schappals, Martin Schenk, Andre Schedemann, Mandes Schönherr, Steffen Seckler, Simon Stephan, Katrin Stöbener, Nikola Tchipev, Amer Wafai, Stephan Werth, Hans Hasse), In Chemie Ingenieur Technik, Series: 90(3), pp. 295–306, (Editors: Norbert Asprion), Wiley, ISSN: 1522-2640, 2018-02-01
    BibTeX DOI
    Abstract: Der vorliegende Übersichtsartikel berichtet über Fortschritte in der molekularen Modellierung und Simulation mittels massiv-paralleler Hoch- und Höchstleistungsrechner (HPC). Im SkaSim-Projekt arbeiteten dazu Partner aus der HPC-Community mit Anwendern aus Wissenschaft und Industrie zusammen. Ziel dabei war es mittels HPC-Methoden die Vorhersage von thermodynamischen Stoffdaten in Bezug auf Effizienz, Qualität und Zuverlässigkeit weiter zu optimieren. In diesem Zusammenhang wurden verschiedene Themen bearbeitet: Atomistische Simulation der homogenen Gasblasenbildung, Oberflächenspannung klassischer Fluide und ionischer Flüssigkeiten, multikriterielle Optimierung molekularer Modelle, Weiterentwicklung der Simulationscodes ls1 mardyn und ms2, atomistische Simulation von Gastrennprozessen, molekulare Membran-Strukturgeneratoren, Transportwiderstände und gemischtypenspezifische Bewertung prädiktiver Stoffdatenmodelle.
  • Poster: Quantifying the relative importance of different nitrogen sources on hypoxia development in the northern Gulf of Mexico: A biogeochemical model analysis (Fabian Große, Katja Fennel, Arnaud Laurent), Portland, OR, USA, Ocean Sciences Meeting, 2018
    BibTeX
  • A model–based projection of historical state of a coastal ecosystem: Relevance of phytoplankton stoichiometry (Onur Kerimoglu, Fabian Große, Markus Kreus, Hermann Lenhart, Justes E.E. van Beusekom), In Science of the Total Environment, Series: 639, pp. 1311–1323, Elsevier Science Publishers B. V. (Amsterdam, The Netherlands), ISSN: 0048-9697, 2018
    BibTeX URL DOI
    Abstract: We employed a coupled physical-biogeochemical modelling framework for the reconstruction of the historic (H), pre-industrial state of a coastal system, the German Bight (southeastern North Sea), and we investigated its differences with the recent, control (C) state of the system. According to our findings: i) average winter concentrations of dissolved inorganic nitrogen and phosphorus (DIN and DIP) concentrations at the surface are \~70–90\% and \~50–70\% lower in the H state than in the C state within the nearshore waters, and differences gradually diminish towards off-shore waters; ii) differences in average growing season chlorophyll a (Chl) concentrations at the surface between the two states are mostly less than 50\%; iii) in the off-shore areas, Chl concentrations in the deeper layers are affected less than in the surface layers; iv) reductions in phytoplankton carbon (C) biomass under the H state are weaker than those in Chl, due to the generally lower Chl:C ratios; v) in some areas the differences in growth rates between the two states are negligible, due to the compensation by lower light limitation under the H state, which in turn explains the lower Chl:C ratios; vi) zooplankton biomass, and hence the grazing pressure on phytoplankton is lower under the H state. This trophic decoupling is caused by the low nutritional quality (i.e., low N:C and P:C) of phytoplankton. These results call for increased attention to the relevance of the acclimation capacity and stoichiometric flexibility of phytoplankton for the prediction of their response to environmental change.
  • PetaFLOP Molecular Dynamics for Engineering Applications (Philipp Neumann, Nikola Tchipev, Steffen Seckler, Matthias Heinen, Jadran Vrabec, Hans-Joachim Bungartz), In High Performance Computing in Science and Engineering '18 (tba), Series: Transactions of the High Performance Computing Center Stuttgart (HLRS) 2018, Edition: tba, Springer, ISBN: tba, 2018 (to be published)
    BibTeX URL DOI
    Abstract: Molecular dynamics (MD) simulations enable the investigation of multicomponent and multiphase processes relevant to engineering applications, such as droplet coalescence or bubble formation. These scenarios require the simulation of ensembles containing a large number of molecules. We present recent advances within theMDframework ls1 mardyn which is being developed with particular regard to this class of problems. We discuss several OpenMP schemes that deliver optimal performance at node-level. We have further introduced nonblocking communication and communication hiding for global collective operations. Together with revised data structures and vectorization, these improvements unleash PetaFLOP performance and enable multi-trillion atom simulations on the HLRS supercomputer Hazel Hen. We further present preliminary results achieved for droplet coalescence scenarios at a smaller scale.
  • Recent Advances in Computing (Editorial) (Pavel Solin, José Luis Galán-García, Vit Dolejší, Philipp Neumann, Petr Svacek, Jaroslav Kruis, Volker John), In Applied Mathematics and Computation, Series: 319, pp. 1–1, Elsevier, 2018
    BibTeX DOI

2017

  • Towards Decoupling the Selection of Compression Algorithms from Quality Constraints – an Investigation of Lossy Compression Efficiency (Julian Kunkel, Anastasiia Novikova, Eugen Betke), In Supercomputing Frontiers and Innovations, Series: Volume 4, Number 4, pp. 17–33, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2017-12
    BibTeX URL DOI
    Abstract: Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves information accurately but lossy data compression can achieve much higher compression rates depending on the tolerable error margins. There are many ways of defining precision and to exploit this knowledge, therefore, the field of lossy compression is subject to active research. From the perspective of a scientist, the qualitative definition about the implied loss of data precision should only matter.With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to define various quantities for acceptable error and expected performance behavior. The library then picks a suitable chain of algorithms yielding the user's requirements, the ongoing work is a preliminary stage for the design of an adaptive selector. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific characteristics of tolerable noise, from the task of determining an optimal compression strategy. Future algorithms can be used without changing application code. In this paper, we evaluate various lossy compression algorithms for compressing different scientific datasets (Isabel, ECHAM6), and focus on the analysis of synthetically created data that serves as blueprint for many observed datasets. We also briefly describe the available quantities of SCIL to define data precision and introduce two efficient compression algorithms for individual data points. This shows that the best algorithm depends on user settings and data properties.
  • A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters (Christoph Riesinger, Arash Bakhtiari, Martin Schreiber, Philipp Neumann, Hans-Joachim Bungartz), In Computation, Series: 5(4), pp. 48, (Editors: Karlheinz Schwarz), MDPI, ISSN: 2079-3197, 2017-11-30
    BibTeX DOI
    Abstract: Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90% are achieved leading to 2604.72 GLUPS utilizing 24576 CPU cores and 2048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8×10^9 lattice cells.
  • Understanding Hardware and Software Metrics with respect to Power Consumption (Julian Kunkel, Manuel F. Dolz), In Sustainable Computing: Informatics and Systems, Series: Sustainable Computing, (Editors: Ishfaq Ahmad), Elsevier, ISSN: 2210-5379, 2017-11-04
    BibTeX URL DOI
    Abstract: Analyzing and understanding energy consumption of applications is an important task which allows researchers to develop novel strategies for optimizing and conserving energy. A typical methodology is to reduce the complexity of real systems and applications by developing a simplified performance model from observed behavior. In the literature, many of these models are known; however, inherent to any simplification is that some measured data cannot be explained well. While analyzing a models accuracy, it is highly important to identify the properties of such prediction errors. Such knowledge can then be used to improve the model or to optimize the benchmarks used for training the model parameters. For such a benchmark suite, it is important that the benchmarks cover all the aspects of system behavior to avoid overfitting of the model for certain scenarios. It is not trivial to identify the overlap between the benchmarks and answer the question if a benchmark causes different hardware behavior. Inspection of all the available hardware and software counters by humans is a tedious task given the large amount of real-time data they produce. In this paper, we utilize statistical techniques to foster understand and investigate hardware counters as potential indicators of energy behavior. We capture hardware and software counters including power with a fixed frequency and analyze the resulting timelines of these measurements. The concepts introduced can be applied to any set of measurements in order to compare them to another set of measurements. We demonstrate how these techniques can aid identifying interesting behavior and significantly reducing the number of features that must be inspected. Next, we propose counters that can potentially be used for building linear models for predicting with a relative accuracy of 3%. Finally, we validate the completeness of a benchmark suite, from the point of view of using the available architectural components, for generating accurate models.
  • JULEA: A Flexible Storage Framework for HPC (Michael Kuhn), In High Performance Computing, Lecture Notes in Computer Science (10524), (Editors: Julian Kunkel, Rio Yokota, Michela Taufer, John Shalf), Springer International Publishing, ISC High Performance 2017, Frankfurt, Germany, ISBN: 978-3-319-67629-6, 2017-11
    BibTeX DOI
  • Poster: The use of ICON data within SAGA GIS for decision support in agricultural crop land utilisation (Jannek Squar, Michael Bock, Olaf Conrad, Christoph Geck, Tobias Kawohl, Michael Kuhn, Lars Landschreiber, Hermann Lenhart, Sandra Wendland, Thomas Ludwig, Jürgen Böhner), Hamburg, Germany, DKRZ user workshop 2017, 2017-10-09
    BibTeX
  • Poster: Icosahedral Modeling with GGDML (Nabeeh Jumah, Julian Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, Yann Meurdesoif), Hamburg, Germany, DKRZ user workshop 2017, 2017-10-09
    BibTeX Publication
    Abstract: The atmospheric and climate sciences and the natural sciences in general are increasingly demanding for higher performance computing. Unfortunately, the gap between the diversity of the hardware architectures that the manufacturers provide to fulfill the needs for performance and the scientific modeling can't be filled by the general-purpose languages and compilers. Scientists need to manually optimize their models to exploit the machine capabilities. This leads to code redundancies when targeting different machines. This is not trivial while considering heterogeneous computing as a basis for exascale computing.
    In order to provide performance portability to the icosahedral climate modeling we have developed a set of higher-level language extensions we call GGDML. The extensions provide semantically-higher-level constructs allowing to express scientific problems with scientific concepts. This eliminates the need to explicitly provide lower-level machine-dependent code. Scientists still use the general-purpose language. The GGDML code is translated by a source-to-source translation tool that optimizes the generated code to a specific machine. The translation process is driven by configurations that are provided independently from the source code.
    In this poster we review some GGDML extensions and we focus mainly on the configurable code translation of the higher-level code.
  • A mouse model for embryonal tumors with multilayered rosettes uncovers the therapeutic potential of Sonic-hedgehog inhibitors (Julia E. Neumann, Annika K. Wefers, Sander Lambo, Edoardo Bianchi, Marie Bockstaller, Mario M. Dorostkar, Valerie Meister, Pia Schindler, Andrey Korshunov, Katja von Hoff, Johannes Nowak, Monika Warmuth-Metz, Marlon R. Schneider, Ingrid Renner-Müller, Daniel J. Merk, Mehdi Shakarami, Tanvi Sharma, Lukas Chavez, Rainer Glass, Jennifer A. Chan, M. Mark Taketo, Philipp Neumann, Marcel Kool, Ulrich Schüller), In Nature Medicine, Series: 23(10), pp. 1191–1202, Nature Publishing Group, ISSN: 1546-170X, 2017-09-11
    BibTeX DOI
    Abstract: Embryonal tumors with multilayered rosettes (ETMRs) have recently been described as a new entity of rare pediatric brain tumors with a fatal outcome. We show here that ETMRs are characterized by a parallel activation of Shh and Wnt signaling. Co-activation of these pathways in mouse neural precursors is sufficient to induce ETMR-like tumors in vivo that resemble their human counterparts on the basis of histology and global gene-expression analyses, and that point to apical radial glia cells as the possible tumor cell of origin. Overexpression of LIN28A, which is a hallmark of human ETMRs, augments Sonic-hedgehog (Shh) and Wnt signaling in these precursor cells through the downregulation of let7-miRNA, and LIN28A/let7a interaction with the Shh pathway was detected at the level of Gli mRNA. Finally, human ETMR cells that were transplanted into immunocompromised host mice were responsive to the SHH inhibitor arsenic trioxide (ATO). Our work provides a novel mouse model in which to study this tumor type, demonstrates the driving role of Wnt and Shh activation in the growth of ETMRs and proposes downstream inhibition of Shh signaling as a therapeutic option for patients with ETMRs.
  • Could Blobs Fuel Storage-Based Convergence Between HPC and Big Data? (Pierre Matri, Yevhen Alforov, Alvaro Brandon, Michael Kuhn, Philip Carns, Thomas Ludwig), In 2017 IEEE International Conference on Cluster Computing, pp. 81–86, IEEE Computer Society, CLUSTER 2017, Honolulu, USA, ISBN: 978-1-5386-2326-8, 2017-09
    BibTeX URL DOI
  • MaMiCo: Transient Multi-Instance Molecular-Continuum Flow Simulation on Supercomputers (Philipp Neumann, Xin Bian), In Computer Physics Communications, Series: 220, pp. 390–402, Elsevier, ISSN: 0010-4655, 2017-07-27
    BibTeX DOI
    Abstract: We present extensions of the macro-micro-coupling tool MaMiCo, which was designed to couple continuum fluid dynamics solvers with discrete particle dynamics. To enable local extraction of smooth flow field quantities especially on rather short time scales, sampling over an ensemble of molecular dynamics simulations is introduced. We provide details on these extensions including the transient coupling algorithm, open boundary forcing, and multi-instance sampling. Furthermore, we validate the coupling in Couette flow using different particle simulation software packages and particle models, i.e. molecular dynamics and dissipative particle dynamics. Finally we demonstrate the parallel scalability of the molecular-continuum simulations by using up to 65 536 compute cores of the supercomputer Shaheen II located at KAUST.
  • GGDML: Icosahedral Models Language Extensions (Nabeeh Jumah, Julian Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, Yann Meurdesoif), In Journal of Computer Science Technology Updates, Series: Volume 4, Number 1, pp. 1–10, Cosmos Scholars Publishing House, 2017-06-21
    BibTeX URL DOI
    Abstract: The optimization opportunities of a code base are not completely exploited by compilers. In fact, there are optimizations that must be done within the source code. Hence, if the code developers skip some details, some performance is lost. Thus, the use of a general-purpose language to develop a performance-demanding software -e.g. climate models- needs more care from the developers. They should take into account hardware details of the target machine.
    Besides, writing a high-performance code for one machine will have a lower performance on another one. The developers usually write multiple optimized sections or even code versions for the different target machines. Such codes are complex and hard to maintain.
    In this article we introduce a higher-level code development approach, where we develop a set of extensions to the language that is used to write a model’s code. Our extensions form a domain-specific language (DSL) that abstracts domain concepts and leaves the lower level details to a configurable source-to-source translation process.
    The purpose of the developed extensions is to support the icosahedral climate/atmospheric model development. We have started with the three icosahedral models: DYNAMICO, ICON, and NICAM. The collaboration with the scientists from the weather/climate sciences enabled agreed-upon extensions. When we have suggested an extension we kept in mind that it represents a higher-level domain-based concept, and that it carries no lower-level details.
    The introduced DSL (GGDML- General Grid Definition and Manipulation Language) hides optimization details like memory layout. It reduces code size of a model to less than one third its original size in terms of lines of code. The development costs of a model with GGDML are therefore reduced significantly.
  • Poster: Towards Performance Portability for Atmospheric and Climate Models with the GGDML DSL (Nabeeh Jumah, Julian Kunkel, Günther Zängl, Hisashi Yashiro, Thomas Dubos, Yann Meurdesoif), Germany, Frankfurt, ISC 2017, 2017-06-20
    BibTeX URL Publication
    Abstract: Demand for high-performance computing is increasing in atmospheric and climate sciences, and in natural sciences in general. Unfortunately, automatic optimizations done by compilers are not enough to make use of target machines' capabilities. Manual code adjustments are mandatory to exploit hardware capabilities. However, optimizing for one architecture, may degrade performance for other architectures. This loss of portability is a challenge. With GGDML we examine an approach for icosahedral-grid based climate and atmospheric models, that is based on a domain-specific language (DSL) which fosters separation of concerns between domain scientists and computer scientists. Our DSL extends Fortran language with concepts from domain science, apart from any technical descriptions such as hardware based optimization. The approach aims to achieve high performance, portability and maintainability through a compilation infrastructure principally built upon configurations from computer scientists. Fortran code extended with novel semantics from the DSL goes through the meta-DSL based compilation procedure. This generates high performance code -aware of platform features, based on provided configurations. We show that our approach reduces code significantly (to 40%) and improves readability for the models DYNAMICO, ICON and NICAM. We also show that the whole approach is viable in terms of performance portability, as it allows to generate platform-optimized code with minimal configuration changes. With a few lines, we are able to switch between two different memory representations during compilation and achieve double the performance. In addition, applying inlining and loop fusion yields 10 percent enhanced performance.
  • Poster: FortranTestGenerator: Automatic and Flexible Unit Test Generation for Legacy HPC Code (Christian Hovy, Julian Kunkel), Frankfurt, ISC High Performance 2017, 2017-06-20
    BibTeX Publication
    Abstract: Unit testing is an established practice in professional software development. However, in high-performance computing (HPC) with its scientific applications, it is not widely applied. Besides general problems regarding testing of scientific software, for many HPC applications the effort of creating small test cases with a consistent set of test data is high. We have created a tool called FortranTestGenerator to reduce the effort of creating unit tests for subroutines of an existing Fortran application. It is based on Capture & Replay (C&R), that is, it extracts data while running the original application and uses the extracted data as test input data. The tool automatically generates code for capturing the input data and a basic test driver which can be extended by the developer to a meaningful unit test. A static source code analysis is conducted, to reduce the number of captured variables. Code is generated based on flexibly customizable templates. Thus, both the capturing process and the unit tests can easily be integrated into an existing software ecosystem.
  • Poster: The Virtual Institute for I/O and the IO-500 (Julian Kunkel, Jay Lofstead, John Bent), Frankfurt, Germany, ISC High Performance 2017, 2017-06-20
    BibTeX Publication
  • Poster: i_SSS – integrated Support System for Sustainability (Jannek Squar, Michael Bock, Olaf Conrad, Christoph Geck, Tobias Kawohl, Michael Kuhn, Lars Landschreiber, Hermann Lenhart, Sandra Wendland, Thomas Ludwig, Jürgen Böhner), Frankfurt, Germany, ISC High Performance 2017, 2017-06-20
    BibTeX URL
  • Poster: Advanced Computation and I/O Methods for Earth-System Simulations (AIMES) (Julian Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther Zängl, Hisashi Yashiro, Ryuji Yoshida, Hirofumi Tomita, Masaki Satoh, Yann Meurdesoif, Nabeeh Jumah, Anastasiia Novikova), Germany, Frankfurt, ISC 2017, 2017-06-20
    BibTeX URL Publication
    Abstract: The Advanced Computation and I/O Methods for Earth-System Simulations (AIMES) project addresses the key issues of programmability, computational efficiency and I/O limitations that are common in next-generation icosahedral earth-system models. Ultimately, the project is intended to foster development of best-practices and useful norms by cooperating on shared ideas and components. During the project, we ensure that the developed concepts and tools are not only applicable for earth-science but for other scientific domains as well.
  • SFS: A Tool for Large Scale Analysis of Compression Characteristics (Julian Kunkel), Research Papers (4), Research Group: Scientific Computing, University of Hamburg (Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg), 2017-05-05
    BibTeX Publication
    Abstract: Data centers manage Petabytes of storage. Identifying the a fast lossless compression algorithm that is enabled on the storage system
    that potentially reduce data by additional 10% is significant. However, it is not trivial to evaluate algorithms on huge data pools as this evaluation requires running the algorithms and, thus, is costly, too. Therefore, there is the need for tools to optimize such an analysis. In this paper, the open source tool SFS is described that perform these scans efficiently. While based on an existing open source tool, SFS builds on a proven method to scan huge quantities of data using sampling from statistic. Additionally, we present results of 162 variants of various algorithms conducted on three data pools with scientific data and one more general purpose data pool. Based on this analysis promising classes of algorithms are identified.
  • Interaktiver C-Programmierkurs, ICP (Julian Kunkel, Jakob Lüttgau), In HOOU Content Projekte der Vorprojektphase 2015/16 – Sonderband zum Fachmagazin Synergie (Kerstin Mayrberger), pp. 182–186, Universität Hamburg (Universität Hamburg, Mittelweg 177, 20148 Hamburg), ISBN: 978-3-924330-57-6, 2017-04-10
    BibTeX URL
    Abstract: Programmiersprachen bilden die Basis für die automatisierte Datenverarbeitung in der digitalen Welt. Obwohl die Grundkonzepte einfach zu verstehen sind, beherrscht nur ein geringer Anteil von Personen diese Werkzeuge. Die Gründe hierfür sind Defizite in der Ausbildung und die hohe Einstiegshürde bei der Bereitstellung einer produktiven Programmierumgebung. Insbesondere erfordert das Erlernen einer Programmiersprache die praktische Anwendung der Sprache, vergleichbar mit dem Erlernen einer Fremdsprache. Ziel des Projekts ist die Erstellung eines interaktiven Kurses für die Lehre der Programmiersprache C. Die Interaktivität und das angebotene automatische Feedback sind an den Bedürfnissen der Teilnehmerinnen und Teilnehmer orientiert und bieten die Möglichkeit, autodidaktisch Kenntnisse auf- und auszubauen. Die Lektionen beinhalten sowohl die Einführung in spezifische Teilthemen als auch anspruchsvollere Aufgaben, welche die akademischen Problemlösefähigkeiten fördern. Damit werden unterschiedliche akademische Zielgruppen bedient und aus verschieden Bereichen der Zivilgesellschaft an die Informatik herangeführt. Der in diesem Projekt entwickelte Programmierkurs und die Plattform zur Programmierung können weltweit frei genutzt werden, und der Quellcode bzw. die Lektionen stehen unter Open-Source-Lizenzen und können deshalb beliebig auf die individuellen Bedürfnisse angepasst werden. Dies ermöglicht insbesondere das Mitmachen und Besteuern von neuen Lektionen zur Plattform.
  • Poster: Intelligent Selection of Compiler Options to Optimize Compile Time and Performance (Anja Gerbes, Julian Kunkel, Nabeeh Jumah), Saarbrücken, Euro LLVM, 2017-03-27
    BibTeX URL Publication
    Abstract: The efficiency of the optimization process during the compilation is crucial for the later execution behavior of the code. The achieved performance depends on the hardware architecture and the compiler's capabilities to extract this performance. Code optimization can be a CPU- and memory-intensive process which – for large codes – can lead to high compilation times during development. Optimization also influences the debuggability of the resulting binary; for example, by storing data in registers. During development, it would be interesting to compile files individually with appropriate flags that enable debugging and provide high (near-production) performance during the testing but with moderate compile times. We are exploring to create a tool to identify code regions that are candidates for higher optimization levels. We follow two different approaches to identify the most efficient code optimization: 1) compiling different files with different options by brute force; 2) using profilers to identify the relevant code regions that should be optimized. Since big projects comprise hundreds of files, brute force is not efficient. The problem in, e.g., climate applications is that codes have too many files to test them individually. Improving this strategy using a profiler, we can identify the time consuming regions (and files) and then repeatedly refine our selection. Then, the relevant files are evaluated with different compiler flags to determine a good compromise of the flags. Once the appropriate flags are determined, this information could be retained across builds and shared between users. In our poster, we motivate and demonstrate this strategy on a stencil code derived from climate applications. The experiments done throughout this work are carried out on a recent Intel Skylake (i7-6700 CPU @ 3.40GHz) machine. We compare performance of the compilers clang (version 3.9.1) and gcc (version 6.3.0) for various optimization flags and using profile guided optimization (PGO) with the traditional compile with instrumentation/run/compile phase and when using the perf tool for dynamic instrumentation. The results show that more time (2x) is spent for compiling code using higher optimization levels in general, though gcc takes a little less time in general than clang. Yet the performance of the application were comparable after compiling the whole code with O3 to that of applying O3 optimization to the right subset of files. Thus, the approach proves to be effective for repositories where compilation is analyzed to guide subsequent compilations. Based on these results, we are building a prototype tool that can be embedded into building systems that realizes the aforementioned strategies of brute-force testing and profile guided analysis of relevant compilation flags.
  • ESiWACE: Auf dem Weg zu wolkenauflösenden Klimamodellen (Philipp Neumann, Joachim Biercamp), Jahrbuch 2017, Max-Planck-Gesellschaft, 2017-01-01
    BibTeX DOI
    Abstract: Quantitative Abschätzungen zu erwartender Änderungen bei Wetterextremen sind von großer Bedeutung. Unabdingbar dafür ist die Entwicklung von Modellen zur Simulation von Konvektion und Wolken sowie kleinskaligen Wirbeln im Ozean. Solche Modelle benötigen eine Auflösung von einem Kilometer und müssen pro Tag mehrere Monate simulieren können. ESiWACE vereint Wetter- und Klimawissenschaften und befasst sich mit der Optimierung von Simulationsabläufen auf Supercomputern. Ein Ziel ist es, technische Möglichkeiten auszuloten, aber auch Grenzen bei der Erstellung von wolkenauflösenden Modellen.
  • Performance and Power Optimization (Michael Kuhn, Konstantinos Chasapis, Manuela Kuhn, Janusz Malka, Thomas Stibor, Gvozden Nešković), In Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA) (Christopher Jung, Jörg Meyer, Achim Streit), pp. 141–160, KIT Scientific Publishing (Karlsruhe, Germany), ISBN: 978-3-7315-0695-9, 2017
    BibTeX URL DOI
    Abstract: The Helmholtz Association funded the “Large-Scale Data Management and Analysis” portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities.
  • A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns (Raul Torres, Julian Kunkel, Manuel Dolz, Thomas Ludwig), In International Conference on Parallel Computing Technologies, Lecture Notes in Computer Science (10421), pp. 500–512, (Editors: Victor Malyshkin), Springer, PaCT, Nizhni Novgorod, Russia, ISBN: 978-3-319-62932-2, 2017
    BibTeX DOI
    Abstract: Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.
  • Simulation of Hierarchical Storage Systems for TCO and QoS (Jakob Lüttgau, Julian Kunkel), In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Lecture Notes in Computer Science (10524), pp. 116–128, (Editors: Julian Kunkel, Rio Yokota, Michaela Taufer, John Shalf), Springer, ISC High Performance, Frankfurt, Germany, ISBN: 978-3-319-67629-6, 2017
    BibTeX DOI
    Abstract: Due to the variety of storage technologies deep storage hierarchies turn out to be the most feasible choice to meet performance and cost requirements when handling vast amounts of data. Long-term archives employed by scientific users are mainly reliant on tape storage, as it remains the most cost-efficient option. Archival systems are often loosely integrated into the HPC storage infrastructure. In expectation of exascale systems and in situ analysis also burst buffers will require integration with the archive. Exploring new strategies and developing open software for tape systems is a hurdle due to the lack of affordable storage silos and availability outside of large organizations and due to increased wariness requirements when dealing with ultra-durable data. Lessening these problems by providing virtual storage silos should enable community-driven innovation and enable site operators to add features where they see fit while being able to verify strategies before deploying on production systems. Different models for the individual components in tape systems are developed. The models are then implemented in a prototype simulation using discrete event simulation. The work shows that the simulations can be used to approximate the behavior of tape systems deployed in the real world and to conduct experiments without requiring a physical tape system.
  • An MPI-IO In-Memory Driver for Non-Volatile Pooled Memory of the Kove XPD (Julian Kunkel, Eugen Betke), In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Lecture Notes in Computer Science (10524), pp. 644–655, (Editors: Julian Kunkel, Rio Yokota, Michaela Taufer, John Shalf), Springer, ISC High Performance, Frankfurt, Germany, ISBN: 978-3-319-67629-6, 2017
    BibTeX
    Abstract: Many scientific applications are limited by the performance offered by parallel file systems. SSD based burst buffers provide significant better performance than HDD backed storage but at the expense of capacity. Clearly, achieving wire-speed of the interconnect and predictable low latency I/O is the holy grail of storage. In-memory storage promises to provide optimal performance exceeding SSD based solutions. Kove R ’s XPD R offers pooled memory for cluster systems. This remote memory is asynchronously backed up to storage devices of the XPDs and considered to be non-volatile. Albeit the system offers various APIs to access this memory such as treating it as a block device, it does not allow to expose it as file system that offers POSIX or MPI-IO semantics. In this paper, we 1) describe the XPD-MPIIO-driver which supports the scale-out architecture of the XPDs. This MPI-agnostic driver enables high-level libraries to utilize the XPD’s memory as storage. 2) A thorough performance evaluation of the XPD is conducted. This includes scaleout testing of the infrastructure and metadata operations but also performance variability. We show that the driver and storage architecture is able to nearly saturate wire-speed of Infiniband (60+ GiB/s with 14 FDR links) while providing low latency and little performance variability.
  • Toward Decoupling the Selection of Compression Algorithms from Quality Constraints (Julian Kunkel, Anastasiia Novikova, Eugen Betke, Armin Schaare), In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Lecture Notes in Computer Science (10524), pp. 1–12, (Editors: Julian Kunkel, Rio Yokota, Michaela Taufer, John Shalf), Springer, ISC High Performance, Frankfurt, Germany, ISBN: 978-3-319-67629-6, 2017
    BibTeX DOI
    Abstract: Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves the original information accurately but on the domain of climate data usually yields a compression factor of only 2:1. Lossy data compression can achieve much higher compression rates depending on the tolerable error/precision needed. Therefore, the field of lossy compression is still subject to active research. From the perspective of a scientist, the compression algorithm does not matter but the qualitative information about the implied loss of precision of data is a concern. With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to set various quantities that define the acceptable error and the expected performance behavior. The ongoing work a preliminary stage for the design of an automatic compression algorithm selector. The task of this missing key component is the construction of appropriate chains of algorithms to yield the users requirements. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific ground characteristics of tolerable noise, from the task of determining an optimal compression strategy given target noise levels and constraints. Future algorithms are used without change in the application code, once they are integrated into SCIL. In this paper, we describe the user interfaces and quantities, two compression algorithms and evaluate SCIL’s ability for compressing climate data. This will show that the novel algorithms are competitive with state-of-the-art compressors ZFP and SZ and illustrate that the best algorithm depends on user settings and data properties.
  • Real-Time I/O-Monitoring of HPC Applications with SIOX, Elasticsearch, Grafana and FUSE (Eugen Betke, Julian Kunkel), In High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Lecture Notes in Computer Science (10524), pp. 158–170, (Editors: Julian Kunkel, Rio Yokota, Michaela Taufer, John Shalf), Springer, ISC High Performance, Frankfurt, Germany, ISBN: 978-3-319-67629-6, 2017
    BibTeX DOI
    Abstract: The starting point for our work was a demand for an overview of application’s I/O behavior, that provides information about the usage of our HPC “Mistral”. We suspect that some applications are running using inefficient I/O patterns, and probably, are wasting a significant amount of machine hours. To tackle the problem, we focus on detection of poor I/O performance, identification of these applications, and description of I/O behavior. Instead of gathering I/O statistics from global system variables, like many other monitoring tools do, in our approach statistics come directly from I/O interfaces POSIX, MPI, HDF5 and NetCDF. For interception of I/O calls we use an instrumentation library that is dynamically linked with LD_PRELOAD at program startup. The HPC on-line monitoring framework is built on top of open source software: Grafana, SIOX, Elasticsearch and FUSE. This framework collects I/O statistics from applications and mount points. The latter is used for non-intrusive monitoring of virtual memory allocated with mmap(), i.e., no code adaption is necessary. The framework is evaluated showing its effectiveness and critically discussed.
  • A Novel Modeling Approach to Quantify the Influence of Nitrogen Inputs on the Oxygen Dynamics of the North Sea (Fabian Große, Markus Kreus, Hermann Lenhart, Johannes Pätsch, Thomas Pohlmann), In Frontiers in Marine Science, Series: 4, pp. 383, (Editors: Christophe Rabouille), Frontiers (Avenue du Tribunal Fédéral 34, CH-1005 Lausanne, Switzerland), 2017
    BibTeX URL DOI
    Abstract: Oxygen (O$_2$) deficiency, i.e., dissolved O$_2$ concentrations below 6\,mg\,O$_2$\,L$^{-1}$, is a common feature in the southern North Sea. Its evolution is governed mainly by the presence of seasonal stratification and production of organic matter, which is subsequently degraded under O$_2$ consumption. The latter is strongly influenced by riverine nutrient loads, i.e., nitrogen (N) and phosphorus (P). As riverine P loads have been reduced significantly over the past decades, this study aims for the quantification of the influence of riverine and non-riverine N inputs on the O$_2$ dynamics in the southern North Sea. For this purpose, we present an approach to expand a nutrient-tagging technique for physical-biogeochemical models – often referred to as ‘trans-boundary nutrient transports’ (TBNT) – by introducing a direct link to the O$_2$ dynamics. We apply the expanded TBNT to the physical-biogeochemical model system HAMSOM-ECOHAM and focus our analysis on N-related O$_2$ consumption in the southern North Sea during 2000–2014. The analysis reveals that near-bottom O$_2$ consumption in the southern North Sea is strongly influenced by the N supply from the North Atlantic across the northern shelf edge. However, riverine N sources — especially the Dutch, German and British rivers — as well as the atmosphere also play an important role. In the region with lowest simulated O2 concentrations (around 56\,$^\circ$N, 6.5\,$^\circ$E), riverine N on average contributes 39\% to overall near-bottom O$_2$ consumption during seasonal stratification. Here, the German and the large Dutch rivers constitute the highest riverine contributions (11\% and 10\%, respectively). At a site in the Oyster Grounds (around 54.5\,$^\circ$N, 4\,$^\circ$E), the average riverine contribution adds up to 41\%, even exceeding that of the North Atlantic. Here, highest riverine contributions can be attributed to the Dutch and British rivers adding up to almost 28\% on average. The atmospheric contribution results in 13\%. Our results emphasize the importance of anthropogenic N inputs and seasonal stratification for the O$_2$ conditions in the southern North Sea. They further suggest that reductions in the riverine and atmospheric N inputs may have a relevant positive effect on the O$_2$ levels in this region.
  • Lattice Boltzmann Flow Simulation on Android Devices for Interactive Mobile-Based Learning (Philipp Neumann, Michael Zellner), In Euro-Par 2016: Parallel Processing Workshops, Lecture Notes in Computer Science (10104), pp. 3–15, Springer (Berlin, Heidelberg), Euro-Par 2016, Grenoble, ISBN: 978-3-319-58943-5, 2017
    BibTeX DOI
    Abstract: Interactive tools and learning environments have a high potential to facilitate learning. We developed the app LB2M for two-dimensional Lattice Boltzmann-based flow simulation on Android devices. The software enables interactive simulation and visualization of various flow scenarios. We detail the software with regard to design, simulation kernel, and visualization. In particular, we demonstrate how the app can be used to teach basics of fluid dynamics in beginner's courses at the example of cavity flow.
  • Interdisciplinary Teamwork in HPC Education: Challenges, Concepts, and Outcomes (Philipp Neumann, Christoph Kowitz, Felix Schranner, Dmitrii Azarnykh), In Journal of Parallel and Distributed Computing, Series: 105, pp. 83–91, (Editors: Sushil Prasad), Elsevier, ISSN: 0743-7315, 2017
    BibTeX DOI
    Abstract: We present our concept “Teamwork Across Disciplines” which enables interdisciplinary teamwork and soft skill training at course level. The concept is realized in the scope of the course “Turbulent Flow Simulation on HPC-Systems”. We describe the course curriculum and detail various additional aspects of the course with regard to student feedback, continuous course development techniques, and the student team projects.
  • High Performance Shallow Water Kernels for Parallel Overland Flow Simulations Based on FullSWOF2D (Roland Wittmann, Hans-Joachim Bungartz, Philipp Neumann), In Computers and Mathematics with Applications, Series: 74(1), pp. 110–125, (Editors: Jose Galan-Garcia), Elsevier, ISSN: 0898-1221, 2017
    BibTeX DOI
    Abstract: We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.

2016

  • Poster: Predicting I/O-performance in HPC using Artificial Neural Networks (Jan Fabian Schmid, Julian Kunkel), Frankfurt, ISC High Performance 2015, 2016-21-06
    BibTeX Publication
    Abstract: Tools are demanded that help users of HPC-facilities to implement efficient input/output (I/O) in their programs. It is difficult to find the best access parameters and patterns due to complex parallel storage systems. To develop tools which support the implementation of efficient I/O a computational model of the storage system is key. For single hard disk systems such a model can be derived analytically [1]; however, for the complex storage system of a super computer these models become too difficult to configure [2]. Therefore we searched for good predictors of I/O performance using a machine learning approach with artificial neural networks (ANNs). A hypothesis was then proposed: The I/O-path significantly influences the time needed to access a file. In our analysis we used ANNs with different input information for the prediction of access times. To use I/O-paths as input for the ANNs, we developed a method, which approximates the different I/O-paths the storage system used during a benchmark-test. This method utilizes error classes.
  • Poster: Analyzing Data Properties using Statistical Sampling Techniques – Illustrated on Scientific File Formats and Compression Features (Julian Kunkel), Frankfurt, ISC High Performance 2016, 2016-21-06 – Awards: Best Poster
    BibTeX Publication
    Abstract: Understanding the characteristics of data stored in data centers helps computer scientists identifying the most suitable storage infrastructure to deal with these workloads. For example, knowing the relevance of file formats allows optimizing the relevant file formats but also helps in a procurement to define useful benchmarks. Existing studies that investigate performance improvements and techniques for data reduction such as deduplication and compression operate on a small set of data. Some of those studies claim the selected data is representative and scale their result to the scale of the data center. One hurdle of evaluate novel schemes on the complete data is the vast amount of data stored and, thus, the resources required to analyze the complete data set. Even if this would be feasible, the costs for running many of those experiments must be justified. This poster investigates stochastic sampling methods to compute and analyze quantities of interest on file numbers but also on the occupied storage space. It is demonstrated that scanning 1% of files and data volume is sufficient on DKRZ's supercomputer to obtain accurate results. This not only speeds up the analysis process but reduces costs of such studies significantly. Contributions of this poster are: 1) investigation of the inherent error when operating only on a subset of data, 2) presentation of methods that help future studies to mitigate this error and, 3) illustration of the approach with a study for scientific file types and compression
  • Interaktiver C-Programmierkurs, ICP (Julian Kunkel, Jakob Lüttgau), In Synergie, Fachmagazin für Digitalisierung in der Lehre (2), pp. 74–75, 2016-11-16
    BibTeX URL
    Abstract: Programmiersprachen bilden die Basis für die automatisierte Datenverarbeitung in der digitalen Welt. Obwohl die Grundkonzepte einfach zu verstehen sind, beherrscht nur ein geringer Anteil von Personen diese Werkzeuge. Die Gründe hierfür sind Defizite in der Ausbildung und die hohe Einstiegshürde bei der Bereitstellung einer produktiven Programmierumgebung. Insbesondere erfordert das Erlernen einer Programmiersprache die praktische Anwendung der Sprache, vergleichbar mit dem Erlernen einer Fremdsprache. Ziel des Projekts ist die Erstellung eines interaktiven Kurses für die Lehre der Programmiersprache C. Die Interaktivität und das angebotene automatische Feedback sind an den Bedürfnissen der Teilnehmerinnen und Teilnehmer orientiert und bieten die Möglichkeit, autodidaktisch Kenntnisse auf- und auszubauen. Die Lektionen beinhalten sowohl die Einführung in spezifische Teilthemen als auch anspruchsvollere Aufgaben, welche die akademischen Problemlösefähigkeiten fördern. Damit werden unterschiedliche akademische Zielgruppen bedient und aus verschieden Bereichen der Zivilgesellschaft an die Informatik herangeführt. Der in diesem Projekt entwickelte Programmierkurs und die Plattform zur Programmierung können weltweit frei genutzt werden, und der Quellcode bzw. die Lektionen stehen unter Open-Source-Lizenzen und können deshalb beliebig auf die individuellen Bedürfnisse angepasst werden. Dies ermöglicht insbesondere das Mitmachen und Besteuern von neuen Lektionen zur Plattform.
  • Analyzing Data Properties using Statistical Sampling – Illustrated on Scientific File Formats (Julian Kunkel), In Supercomputing Frontiers and Innovations, Series: Volume 3, Number 3, pp. 19–33, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2016-10
    BibTeX URL DOI
    Abstract: Understanding the characteristics of data stored in data centers helps computer scientists in identifying the most suitable storage infrastructure to deal with these workloads. For example, knowing the relevance of file formats allows optimizing the relevant formats but also helps in a procurement to define benchmarks that cover these formats. Existing studies that investigate performance improvements and techniques for data reduction such as deduplication and compression operate on a subset of data. Some of those studies claim the selected data is representative and scale their result to the scale of the data center. One hurdle of running novel schemes on the complete data is the vast amount of data stored and, thus, the resources required to analyze the complete data set. Even if this would be feasible, the costs for running many of those experiments must be justified. This paper investigates stochastic sampling methods to compute and analyze quantities of interest on file numbers but also on the occupied storage space. It will be demonstrated that on our production system, scanning 1% of files and data volume is sufficient to deduct conclusions. This speeds up the analysis process and reduces costs of such studies significantly.
  • Predicting I/O Performance in HPC Using Artificial Neural Networks (Jan Fabian Schmid, Julian Kunkel), In Supercomputing Frontiers and Innovations, Series: Volume 3, Number 3, pp. 34–39, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2016-10
    BibTeX URL DOI
    Abstract: The prediction of file access times is an important part for the modeling of supercomputer's storage systems. These models can be used to develop analysis tools which support the users to integrate efficient I/O behavior. In this paper, we analyze and predict the access times of a Lustre file system from the client perspective. Therefore, we measure file access times in various test series and developed different models for predicting access times.
    The evaluation shows that in models utilizing artificial neural networks the average prediciton error is about 30% smaller than in linear models. A phenomenon in the distribution of file access times is of particular interest: File accesses with identical parameters show several typical access times.The typical access times usually differ by orders of magnitude and can be explained with a different processing of the file accesses in the storage system - an alternative I/O path. We investigate a method to automatically determine the alternative I/O path and quantify the significance of knowledge about the internal processing. It is shown that the prediction error is improved significantly with this approach.
  • Analyzing Data Properties using Statistical Sampling Techniques – Illustrated on Scientific File Formats and Compression Features (Julian Kunkel), In High Performance Computing: ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P3MA, VHPC, WOPSSS, Lecture Notes in Computer Science (9945 2016), pp. 130–141, (Editors: Michela Taufer, Bernd Mohr, Julian Kunkel), Springer, ISC-HPC 2017, Frankfurt, Germany, ISBN: 978-3-319-46079-6, 2016-06
    BibTeX DOI
    Abstract: Understanding the characteristics of data stored in data centers helps computer scientists in identifying the most suitable storage infrastructure to deal with these workloads. For example, knowing the relevance of file formats allows optimizing the relevant formats but also helps in a procurement to define benchmarks that cover these formats. Existing studies that investigate performance improvements and techniques for data reduction such as deduplication and compression operate on a small set of data. Some of those studies claim the selected data is representative and scale their result to the scale of the data center. One hurdle of running novel schemes on the complete data is the vast amount of data stored and, thus, the resources required to analyze the complete data set. Even if this would be feasible, the costs for running many of those experiments must be justified. This paper investigates stochastic sampling methods to compute and analyze quantities of interest on file numbers but also on the occupied storage space. It will be demonstrated that on our production system, scanning 1 % of files and data volume is sufficient to deduct conclusions. This speeds up the analysis process and reduces costs of such studies significantly. The contributions of this paper are: (1) the systematic investigation of the inherent analysis error when operating only on a subset of data, (2) the demonstration of methods that help future studies to mitigate this error, (3) the illustration of the approach on a study for scientific file types and compression for a data center.
  • Data Compression for Climate Data (Michael Kuhn, Julian Kunkel, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 3, Number 1, pp. 75–94, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2016-06
    BibTeX URL DOI
  • Steady-State Anderson Accelerated Coupling of Lattice Boltzmann and Navier-Stokes Solvers (Atanas Atanasov, Benjamin Uekermann, Carlos A. Pachajoa Mejia, Hans-Joachim Bungartz, Philipp Neumann), In Computation, Series: 4(4), pp. 19, (Editors: Karlheinz Schwarz), MDPI, ISSN: 2079-3197, 2016
    BibTeX DOI
    Abstract: We present an Anderson acceleration-based approach to spatially couple three-dimensional Lattice Boltzmann and Navier-Stokes (LBNS) flow simulations. This allows to locally exploit the computational features of both fluid flow solver approaches to the fullest extent and yields enhanced control to match the LB and NS degrees of freedom within the LBNS overlap layer. Designed for parallel Schwarz coupling, the Anderson acceleration allows for the simultaneous execution of both Lattice Boltzmann and Navier-Stokes solver. We detail our coupling methodology, validate it, and study convergence and accuracy of the Anderson accelerated coupling, considering three steady-state scenarios: plane channel flow, flow around a sphere and channel flow across a porous structure. We find that the Anderson accelerated coupling yields a speed-up (in terms of iteration steps) of up to 40% in the considered scenarios, compared to strictly sequential Schwarz coupling.
  • Software for Exascale Computing - SPPEXA 2013-2015 (Hans-Joachim Bungartz, Philipp Neumann, Wolfgang E. Nagel), Springer (Berlin, Heidelberg), ISBN: 978-3-319-40528-5, 2016
    BibTeX
    Abstract: The research and its outcomes presented in this collection focus on various aspects of high-performance computing (HPC) software and its development which is confronted with various challenges as today's supercomputer technology heads towards exascale computing. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The collection thereby highlights pioneering research findings as well as innovative concepts in exascale software development that have been conducted under the umbrella of the priority programme “Software for Exascale Computing” (SPPEXA) of the German Research Foundation (DFG) and that have been presented at the SPPEXA Symposium, Jan 25-27 2016, in Munich. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest.
  • Coupling 4 Molecular Dynamics Codes in a Massively Parallel Molecular-Continuum Fluid Dynamics Framework (Hans-Joachim Bungartz, Philipp Neumann, Nikola Tchipev, Wolfgang Eckhardt, Piet Jarmatz), In High Performance Computing in Science and Engineering Garching/Munich 2016 (Siegfried Wagner, Arndt Bode, Helmut Brüchle, Matthias Brehm), pp. 156–157, Bayerische Akademie der Wissenschaften (München), 2016
    BibTeX
  • Analyzing the energy consumption of the storage data path (Pablo Llopis, Manuel F. Dolz, Javier Garcia Blas, Florin Isaila, Mohammad Reza Heidari, Michael Kuhn), In The Journal of Supercomputing, Series: Number 11227, pp. 1–18, (Editors: Hamid Arabnia), Springer US, ISSN: 0920-8542, 2016
    BibTeX DOI
  • On transient hybrid Lattice Boltzmann-Navier-Stokes flow simulations (Philipp Neumann), In Journal of Computational Science, Series: 17(2), pp. 482–490, (Editors: Peter Sloot), Elsevier, ISSN: 1877-7503, 2016
    BibTeX DOI
    Abstract: We investigate one- and two-way coupled schemes combining Lattice Boltzmann (LB) and incompressible Navier-Stokes (NS) solvers. The one-way coupled simulation maps information from a coarse-grained NS system onto LB boundaries which allows for arbitrarily complex fluid flow boundary conditions on LB side. We find that this produces accurate velocity, pressure and stress predictions in Couette, Taylor-Green and Karman vortex scenarios. The two-way coupled simulation decomposes the computational domain into overlapping LB and NS domains. We point out that the weak compressibility of LB can have a major impact on the coupled system. Although very good agreement is found for Couette scenarios, this is not achieved to same extent in Taylor-Green flows.
  • MaMiCo: Software design for parallel molecular-continuum flow simulations (Philipp Neumann, Hanno Flohr, Rahul Arora, Piet Jarmatz, Nikola Tchipev, Hans-Joachim Bungartz), In Computer Physics Communications, Series: 200, pp. 324–335, (Editors: N. Stanley Scott), Elsevier, ISSN: 0010-4655, 2016
    BibTeX DOI
    Abstract: The macro–micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum simulations, retaining sequential and parallel performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow simulations which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum simulations. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid simulations is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.
  • Towards Automatic and Flexible Unit Test Generation for Legacy HPC Code (Christian Hovy, Julian Kunkel), In Proceedings of the Fourth International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, SEHPCCSE16, Salt Lake City, Utah, USA, 2016
    BibTeX DOI
    Abstract: Unit testing is an established practice in professional software development. However, in high-performance computing (HPC) with its scientific applications, it is not widely applied. Besides general problems regarding testing of scientific software, for many HPC applications the effort of creating small test cases with a consistent set of test data is high. We have created a tool called FortranTestGenerator, that significantly reduces the effort of creating unit tests for subroutines of an existing Fortran application. It is based on Capture & Replay (C&R), that is, it extracts data while running the original application and uses the extracted data as test input data. The tool automatically generates code for capturing the input data and a basic test driver which can be extended by the developer to an appropriate unit test. A static source code analysis is conducted, to reduce the number of captured variables. Code is generated based on flexibly customizable templates. Thus, both the capturing process and the unit tests can easily be integrated into an existing software ecosystem. Since most HPC applications use message passing for parallel processing, we also present an approach to extend our C&R model to MPI communication. This allows extraction of unit tests from massively parallel applications that can be run with a single process.
  • Load Balancing for Molecular Dynamics Simulations on Heterogeneous Architectures (Steffen Seckler, Nikola Tchipev, Hans-Joachim Bungartz, Philipp Neumann), In 2016 IEEE 23rd International Conference on High Performance Computing, pp. 101–110, IEEE, HiPC 2016, Hyderabad, ISBN: 978-1-5090-5411-4, 2016
    BibTeX DOI
    Abstract: Upcoming exascale compute systems are expected to be built from heterogeneous hardware architectures. According to this trend, there exist various methods to handle clusters composed of CPUs, GPUs or other accelerators. Most of these assume that each node has the same structure - for example a dual socket system with an accelerator (GPU or Xeon Phi). The workload is then distributed homogeneously among the nodes. However, not all clusters fulfill this requirement. They might contain different partitions with and without accelerators. Furthermore, depending on the underlying problem to be solved, accelerator cards may perform better in native mode compared to offloading. Besides, various aspects such as cooling may influence the performance of individual nodes. It therefore cannot always be assumed, that the structure and performance of each node and hence the performance of every MPI rank is the same. In this contribution, we apply a k-d tree decomposition method to balance load on heterogeneous compute clusters. The algorithm is incorporated into the molecular dynamics simulation program ls1 mardyn. We present performance results for simulations executed on hybrid AMD Bulldozer-Intel Sandy Bridge, Intel Westmere-Intel Sandy Bridge and Intel Xeon-Intel Xeon Phi-architectures. The only prerequisite for the proposed algorithm is a cost estimation for different decompositions. It is hence expected to be applicable to a variety of n-body scenarios, for which a domain decomposition is possible.
  • Looking beyond stratification: a model-based analysis of the biological drivers of oxygen deficiency in the North Sea (Fabian Große, Naomi Greenwood, Markus Kreus, Hermann Lenhart, Detlev Machoczek, Johannes Pätsch, Lesley A. Salt, Helmuth Thomas), In Biogeosciences, Series: 13, pp. 2511–2535, (Editors: Veronique Garçon), Copernicus Publications (Bahnhofsallee 1e, 37081 Göttingen, Germany), 2016
    BibTeX URL DOI
    Abstract: Low oxygen conditions, often referred to as oxygen deficiency, occur regularly in the North Sea, a temperate European shelf sea. Stratification represents a major process regulating the seasonal dynamics of bottom oxygen, yet, lowest oxygen conditions in the North Sea do not occur in the regions of strongest stratification. This suggests that stratification is an important prerequisite for oxygen deficiency, but that the complex interaction between hydrodynamics and the biological processes drives its evolution. In this study we use the ecosystem model HAMSOM-ECOHAM to provide a general characterisation of the different zones of the North Sea with respect to oxygen, and to quantify the impact of the different physical and biological factors driving the oxygen dynamics inside the entire sub-thermocline volume and directly above the bottom. With respect to oxygen dynamics, the North Sea can be subdivided into three different zones: (1) a highly productive, non-stratified coastal zone, (2) a productive, seasonally stratified zone with a small sub-thermocline volume, and (3) a productive, seasonally stratified zone with a large sub-thermocline volume. Type 2 reveals the highest susceptibility to oxygen deficiency due to sufficiently long stratification periods (>  60 days) accompanied by high surface productivity resulting in high biological consumption, and a small sub-thermocline volume implying both a small initial oxygen inventory and a strong influence of the biological consumption on the oxygen concentration. Year-to-year variations in the oxygen conditions are caused by variations in primary production, while spatial differences can be attributed to differences in stratification and water depth. The large sub-thermocline volume dominates the oxygen dynamics in the northern central and northern North Sea and makes this region insusceptible to oxygen deficiency. In the southern North Sea the strong tidal mixing inhibits the development of seasonal stratification which protects this area from the evolution of low oxygen conditions. In contrast, the southern central North Sea is highly susceptible to low oxygen conditions (type 2). We furthermore show that benthic diagenetic processes represent the main oxygen consumers in the bottom layer, consistently accounting for more than 50 % of the overall consumption. Thus, primary production followed by remineralisation of organic matter under stratified conditions constitutes the main driver for the evolution of oxygen deficiency in the southern central North Sea. By providing these valuable insights, we show that ecosystem models can be a useful tool for the interpretation of observations and the estimation of the impact of anthropogenic drivers on the North Sea oxygen conditions.

2015

  • ArduPower: A low-cost wattmeter to improve energy efficiency of HPC applications (Manuel F. Dolz, Mohammad Reza Heidari, Michael Kuhn, Thomas Ludwig, Germán Fabregat), In Sixth International Green Computing Conference and Sustainable Computing Conference (IGSC) 2015, pp. 1–8, IEEE, IGSC 2015, Las Vegas, USA, 2015-12
    BibTeX DOI
  • Poster: Interaktiver C Kurs (ICP) (Julian Kunkel, Thomas Ludwig, Jakob Lüttgau, Dion Timmermann, Christian Kautz, Volker Skwarek), Hamburg, Campus Innovation 2015, 2015-11-27
    BibTeX URL
    Abstract: Programmiersprachen bilden die Basis für die automatisierte Datenverarbeitung in der digitalen Welt. Obwohl die Grundkonzepte einfach zu verstehen sind, beherrscht nur ein geringer Anteil von Personen diese Werkzeuge. Die Gründe hierfür sind Defizite in der Ausbildung und die Einstiegsshürde bei der Bereitstellung einer produktiven Programmierumgebung. Insbesondere erfordert das Erlernen einer Programmiersprache die praktische Anwendung der Sprache. Eine Integration von Programmierkursen in die Hamburg Open Online University verbessert nicht nur das Angebot für Studierende, sondern erschließt auch Fachfremden den Zugang zur Informatik.
  • MPI-Checker - Static Analysis for MPI (Alexander Droste, Michael Kuhn, Thomas Ludwig), In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM '15, ACM (New York, USA), SC15, Austin, Texas, USA, ISBN: 978-1-4503-4005-2, 2015-11
    BibTeX URL DOI
  • Analyzing Power Consumption of I/O Operations in HPC Applications (Pablo Llopis, Manuel F. Dolz, Javier García-Blas, Florin Isaila, Jesús Carretero, Mohammad Reza Heidari, Michael Kuhn), In Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015), pp. 107–116, (Editors: Jesus Carretero, Javier Garcia Blas, Roman Wyrzykowski, Emmanuel Jeannot), Computer Architecture, Communications and Systems Group (ARCOS) (Madrid, Spain), NESUS 2015, Jesus Carretero, Krakow, Poland, ISBN: 978-84-608-2581-4, 2015-10
    BibTeX URL
  • Poster: Advanced Data Sieving for Non-Contigouous I/O (Enno Zickler, Julian Kunkel), Frankfurt, Germany, 2015-07-13
    BibTeX URL
  • A Best Practice Analysis of HDF5 and NetCDF-4 Using Lustre (Christopher Bartz, Konstantinos Chasapis, Michael Kuhn, Petra Nerge, Thomas Ludwig), In High Performance Computing, Lecture Notes in Computer Science (9137), pp. 274–281, (Editors: Julian Martin Kunkel, Thomas Ludwig), Springer International Publishing (Switzerland), ISC 2015, Frankfurt, Germany, ISBN: 978-3-319-20118-4, ISSN: 0302-9743, 2015-06
    BibTeX DOI
  • Dynamically Adaptable I/O Semantics for High Performance Computing (Michael Kuhn), In High Performance Computing, Lecture Notes in Computer Science (9137), pp. 240–256, (Editors: Julian Martin Kunkel, Thomas Ludwig), Springer International Publishing (Switzerland), ISC 2015, Frankfurt, Germany, ISBN: 978-3-319-20118-4, ISSN: 0302-9743, 2015-06
    BibTeX DOI
  • Monitoring energy consumption with SIOX (Julian Kunkel, Alvaro Aguilera, Nathanael Hübbe, Marc Wiedemann, Michaela Zimmer), In Computer Science – Research and Development, Series: Volume 30, Number 2, pp. 125–133, Springer, ISSN: 1865-2034, 2015-05
    BibTeX URL DOI
    Abstract: In the face of the growing complexity of HPC systems, their growing energy costs, and the increasing difficulty to run applications efficiently, a number of monitoring tools have been developed during the last years. SIOX
    is one such endeavor, with a uniquely holistic approach: Not only does it aim to record a certain kind of data, but to make all relevant data available for analysis and optimization. Among other sources, this encompasses data from hardware energy counters and trace data from different hardware/software layers. However, not all data that can be recorded should be recorded. As such, SIOX
    needs good heuristics to determine when and what data needs to be collected, and the energy consumption can provide an important signal about when the system is in a state that deserves closer attention. In this paper, we show that SIOX
    can use Likwid to collect and report the energy consumption of applications, and present how this data can be visualized using SIOX’s web-interface. Furthermore, we outline how SIOX
    can use this information to intelligently adjust the amount of data it collects, allowing it to reduce the monitoring overhead while still providing complete information about critical situations.
  • Identifying Relevant Factors in the I/O-Path using Statistical Methods (Julian Kunkel), Research Papers (3), Research Group: Scientific Computing, University of Hamburg (Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg), 2015-03-14
    BibTeX Publication
    Abstract: File systems of supercomputers are complex systems of hardware and software. They utilize many optimization techniques such as the cache hierarchy to speed up data access. Unfortunately, this complexity makes assessing I/O difficult. It is impossible to predict the performance of a single I/O operation without knowing the exact system state, as optimizations such as client-side caching of the parallel file system may speed up performance significantly. I/O tracing and characterization tools help capturing the application workload and quantitatively assessing the performance. However, a user has to decide himself if obtained performance is acceptable. In this paper, a density-based method from statistics is investigated to build a model which assists administrators to identify relevant causes (a performance factor). Additionally, the model can be applied to purge unexpectedly slow operations that are caused by significant congestion on a shared resource. It will be sketched, how this could be used in the long term to automatically assess performance and identify the likely cause. The main contribution of the paper is the presentation of a novel methodology to identify relevant performance factors by inspecting the observed execution time on the client side. Starting from a black box model, the methodology is applicable without fully understanding all hardware and software components of the complex system. It then guides the analysis from observations and fosters identification of the most significant performance factors in the I/O path. To evaluate the approach, a model is trained on DKRZ's supercomputer Mistral and validated on synthetic benchmarks. It is demonstrated that the methodology is currently able to distinguish between several client-side storage cases such as sequential and random memory layout, and cached or uncached data, but this will be extended in the future to include server-side I/O factors as well.
  • Looking beyond stratification: a model-based analysis of the biological drivers of oxygen depletion in the North Sea (Fabian Große, Naomi Greenwood, Markus Kreus, Hermann Lenhart, Detlev Machoczek, Johannes Pätsch, Lesley A. Salt, Helmuth Thomas), In Biogeosciences Discussions, Series: 12, pp. 12543–12610, Copernicus Publications (Bahnhofsallee 1e, 37081 Göttingen, Germany), 2015
    BibTeX URL DOI
    Abstract: The problem of low oxygen conditions, often referred to as hypoxia, occurs regularly in the North Sea, a temperate European shelf sea. Stratification represents a major process regulating the seasonal dynamics of bottom oxygen. However, lowest oxygen conditions in the North Sea do not occur in the regions of strongest stratification. This suggests that stratification is an important prerequisite for hypoxia, but that the complex interaction between hydrodynamics and the biological processes drives its development. In this study we use the ecosystem model HAMSOM-ECOHAM5 to provide a general characteristic of the different North Sea oxygen regimes, and to quantify the impact of the different physical and biological factors driving the oxygen dynamics below the thermocline and in the bottom layer. We show that the North Sea can be subdivided into three different regimes in terms of oxygen dynamics: (1) a highly productive, non-stratified coastal regime, (2) a productive, seasonally stratified regime with a small sub-thermocline volume, and (3) a productive, seasonally stratified regime with a large sub-thermocline volume, with regime 2 being highly susceptible to hypoxic conditions. Our analysis of the different processes driving the oxygen development reveals that inter-annual variations in the oxygen conditions are caused by variations in primary production, while spatial differences can be attributed to differences in stratification and water depth. In addition, we show that benthic bacteria represent the main oxygen consumers in the bottom layer, consistently accounting for more than 50 % of the overall consumption. By providing these valuable insights, we show that ecosystem models can be a useful tool for the interpretation of observations and the estimation of the impact of anthropogenic drivers on the North Sea oxygen conditions.
  • Predicting Performance of Non-Contiguous I/O with Machine Learning (Julian Kunkel, Eugen Betke, Michaela Zimmer), In High Performance Computing, 30th International Conference, ISC High Performance 2015, Lecture Notes in Computer Science (9137), pp. 257–273, (Editors: Julian Martin Kunkel, Thomas Ludwig), ISC High Performance, Frankfurt, ISSN: 0302-9743, 2015
    BibTeX
  • Poster: A new post-processing tool for the source-related element tracing in biogeochemical models: A case study for the North Sea (Fabian Große, Markus Kreus, Johannes Pätsch), Vienna, EGU General Assembly 2015, 2015
    BibTeX
    Abstract: The efficient management of marine ecosystems with respect to river load reductions inevitably requires information about the source of organic and inorganic matter in the considered area. Obtaining these information in a temporally and spatially high resolution from observations is challenging and cost-intensive, but can be done with significantly less effort by biogeochemical models. Ménesguen et al. (2006) and Wijsman et al. (2004) developed a method often referred to as ‘trans-boundary nutrient transports’ (TBNT) to mark an element according to its source (e.g. phosphorus from a specific river) when it enters the system and by this tracing it through the whole biogeochemical cycle. Consequently, the results of this method can be used to quantify the distribution of these tracers in different areas of interest. In the meantime, the TBNT method has been implemented to a couple of models applied to different marine areas. However, all these applications required the implementation of TBNT into the underlying model increasing the model overhead and computation time drastically. Radtke et al. (2012) designed a ‘code generation tool’ which is capable of creating model code including TBNT to avoid the manual implementation. Nonetheless, their tool still requires exact information about the process formulations and the model architecture. Our work presents a technically new post-processing approach for TBNT which eludes the implementation into a model by using standard model output and basic information about the model’s processes and grid.
  • Big Data Research at DKRZ – Climate Model Data Production Workflow (Michael Lautenschlager, Panagiotis Adamidis, Michael Kuhn), In Big Data and High Performance Computing (Lucio Grandinetti, Gerhard Joubert, Marcel Kunze, Valerio Pascucci), Series: Advances in Parallel Computing, Edition: 26, pp. 133–155, IOS Press, ISBN: 978-1-61499-582-1, 2015
    BibTeX URL DOI
    Abstract: The paper starts with a classification of climate modeling in Big Data and presents research activities in DKRZ's two basic climate modeling workflows, the climate model development and the climate model data production. Research emphasis in climate model development is on code optimization for efficient use of modern and future multi-core high performance computing architectures. Complementary research is related to increase of I/O bandwidth between compute nodes and hard discs as well as efficient use of storage resources. Research emphasis in climate model data production is on optimization of the end-to-end workflow in its different stages starting from climate model calculations over generation and storage of climate data products and ending in long-term archiving, interdisciplinary data utilization research data publication for integration of citable data entities in scientific literature articles.
  • Integration of FULLSWOF2D and PeanoClaw: Adaptivity and Local Time-Stepping for Complex Overland Flows (Kristof Unterweger, Roland Wittmann, Philipp Neumann, Tobias Weinzierl, Hans-Joachim Bungartz), In Recent Trends in Computational Engineering - CE2014, Lecture Notes in Computational Science and Engineering (105), pp. 181–195, Springer (Berlin, Heidelberg), CE2014, Stuttgart, ISBN: 978-3-319-22996-6, 2015
    BibTeX DOI
    Abstract: We propose to couple our adaptive mesh refinement software PeanoClaw with existing solvers for complex overland flows that are tailored to regular Cartesian meshes. This allows us to augment them with spatial adaptivity and local time-stepping without altering the computational kernels. FullSWOF2D-Full Shallow Water Overland Flows-here is our software of choice though all paradigms hold for other solvers as well. We validate our hybrid simulation software in an artificial test scenario before we provide results for a large-scale flooding scenario of the Mecca region. The latter demonstrates that our coupling approach enables the simulation of complex “real-world” scenarios.
  • An analytical methodology to derive power models based on hardware and software metrics (Manuel F. Dolz, Julian Kunkel, Konstantinos Chasapis, Sandra Catalan), In Computer Science - Research and Development, pp. 1–10, Springer US, ISSN: 1865-2042, 2015
    BibTeX DOI
    Abstract: The use of models to predict the power consumption of a system is an appealing alternative to wattmeters since they avoid hardware costs and are easy to deploy. In this paper, we present an analytical methodology to build models with a reduced number of features in order to estimate power consumption at node level. We aim at building simple power models by performing a per-component analysis (CPU, memory, network, I/O) through the execution of four standard benchmarks. While they are executed, information from all the available hardware counters and resource utilization metrics provided by the system is collected. Based on correlations among the recorded metrics and their correlation with the instantaneous power, our methodology allows (i) to identify the significant metrics; and (ii) to assign weights to the selected metrics in order to derive reduced models. The reduction also aims at extracting models that are based on a set of hardware counters and utilization metrics that can be obtained simultaneously and, thus, can be gathered and computed on-line. The utility of our procedure is validated using real-life applications on an Intel Sandy Bridge architecture.
  • The influence of winter convection on primary production: A parameterisation using a hydrostatic three-dimensional biogeochemical model (Fabian Große, Christian Lindemann, Johannes Pätsch, Jan O. Backhaus), In Journal of Marine Systems, Series: 147, pp. 138–152, Elsevier Science Publishers B. V. (Amsterdam, The Netherlands), ISSN: 0924-7963, 2015
    BibTeX URL DOI
    Abstract: In the recent past observational and modelling studies have shown that the vertical displacement of water parcels, and therefore, phytoplankton particles in regions of deep-reaching convection plays a key role in late winter/early spring primary production. The underlying mechanism describes how convection cells capture living phytoplankton cells and recurrently expose them to sunlight. This study presents a parameterisation called ‘phytoconvection’ which focusses on the influence of convection on primary production. This parameterisation was implemented into a three-dimensional physical–biogeochemical model and applied to the Northwestern European Continental Shelf and areas of the adjacent Northeast Atlantic. The simulation was compared to a ‘conventional’ parameterisation with respect to its influence on phytoplankton concentrations during the annual cycle and its effect on the carbon cycle. The simulation using the new parameterisation showed good agreement with observation data recorded during winter, whereas the reference simulation did not capture the observed phytoplankton concentrations. The new parameterisation had a strong influence on the carbon export through the sinking of particulate organic carbon. The carbon export during late winter/early spring significantly exceeded the export of the reference run. Furthermore, a non-hydrostatic convection model was used to evaluate the major assumption of the presented parameterisation which implies the matching of the mixed layer depth with the convective mixing depth. The applied mixed layer depth criterion principally overestimates the actual convective mixing depth. However, the results showed that this assumption is reasonable during late winter, while indicating a mismatch during spring.
  • Dynamically adaptive Lattice Boltzmann simulation of shallow water flows with the Peano framework (Philipp Neumann, Hans-Joachim Bungartz), In Applied Mathematics and Computation, Series: 267, pp. 795–804, (Editors: Theodore Simos), Elsevier, ISSN: 0096-3003, 2015
    BibTeX DOI
    Abstract: We present a dynamically adaptive Lattice Boltzmann (LB) implementation for solving the shallow water equations (SWEs). Our implementation extends an existing LB component of the Peano framework. We revise the modular design with respect to the incorporation of new simulation aspects and LB models. The basic SWE-LB implementation is validated in different breaking dam scenarios. We further provide a numerical study on stability of the MRT collision operator used in our simulations.
  • Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi (Nikola Tchipev, Amer Wafai, Colin W. Glass, Wolfgang Eckhardt, Alexander Heinecke, Hans-Joachim Bungartz, Philipp Neumann), In Euro-Par 2015: Parallel Processing Workshops, Lecture Notes in Computer Science (9523), pp. 774–785, Springer (Berlin, Heidelberg), Euro-Par 2015, Vienna, ISBN: 978-3-319-27307-5, 2015
    BibTeX DOI
    Abstract: We provide details on the shared-memory parallelization for manycore architectures of the molecular dynamics framework ls1-mardyn, including an optimization of the SIMD vectorization for multi-centered molecules. The novel shared-memory parallelization scheme allows to retain Newton's third law optimization and exhibits very good scaling on many-core devices such as a full Xeon Phi card running 240 threads. The Xeon Phi can thus be exploited and delivers comparable performance as IvyBridge nodes in our experiments.
  • Teamwork Across Disciplines: High-Performance Computing Meets Engineering (Philipp Neumann, Christoph Kowitz, Felix Schranner, Dmitrii Azarnykh), In Euro-Par 2015: Parallel Processing Workshops, Lecture Notes in Computer Science (9523), pp. 125–134, Springer (Berlin, Heidelberg), Euro-Par 2015, Vienna, ISBN: 978-3-319-27307-5, 2015
    BibTeX DOI
    Abstract: We present a general methodology to combine interdisciplinary teamwork experience with classical lecture and lab course concepts, enabling supervised team-based learning among students. The concept is exemplarily applied in a course on high-performance computing (HPC) and computational fluid dynamics (CFD). Evaluation and student feedback suggest that competences on both teamwork as well as on lecture material (CFD and HPC) are acquired.
  • The North Sea – A shelf sea in the Anthropocene (Kay-Christian Emeis, Justus van Beusekom, Ulrich Callies, Ralf Ebinghaus, Andreas Kannen, Gerd Kraus, Ingrid Kröncke, Hermann Lenhart, Ina Lorkowski, Volker Matthias, Christian Möllmann, Johannes Pätsch, Mirco Scharfe, Helmuth Thomas, Ralf Weisse, Eduardo Zorita), In Journal of Marine Systems, Series: 141, pp. 18–33, Elsevier Science Publishers B. V. (Amsterdam, The Netherlands), 2015
    BibTeX URL DOI
    Abstract: Global and regional change clearly affects the structure and functioning of ecosystems in shelf seas. However, complex interactions within the shelf seas hinder the identification and unambiguous attribution of observed changes to drivers. These include variability in the climate system, in ocean dynamics, in biogeochemistry, and in shelf sea resource exploitation in the widest sense by societies. Observational time series are commonly too short, and resolution, integration time, and complexity of models are often insufficient to unravel natural variability from anthropogenic perturbation. The North Sea is a shelf sea of the North Atlantic and is impacted by virtually all global and regional developments. Natural variability (from interannual to multidecadal time scales) as response to forcing in the North Atlantic is overlain by global trends (sea level, temperature, acidification) and alternating phases of direct human impacts and attempts to remedy those. Human intervention started some 1000 years ago (diking and associated loss of wetlands), expanded to near-coastal parts in the industrial revolution of the mid-19th century (river management, waste disposal in rivers), and greatly accelerated in the mid-1950s (eutrophication, pollution, fisheries). The North Sea is now a heavily regulated shelf sea, yet societal goals (good environmental status versus increased uses), demands for benefits and policies diverge increasingly. Likely, the southern North Sea will be re-zoned as riparian countries dedicate increasing sea space for offshore wind energy generation – with uncertain consequences for the system's environmental status. We review available observational and model data (predominantly from the southeastern North Sea region) to identify and describe effects of natural variability, of secular changes, and of human impacts on the North Sea ecosystem, and outline developments in the next decades in response to environmental legislation, and in response to increased use of shelf sea space.

2014

  • Poster: SIOX: An Infrastructure for Monitoring and Optimization of HPC-I/O (Julian Kunkel, Michaela Zimmer, Marc Wiedemann, Nathanael Hübbe, Alvaro Aguilera, Holger Mickler, Xuan Wang, Andrij Chut, Thomas Bönisch), ISC'14 Leipzig, 2014-06-23
    BibTeX URL
    Abstract: Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of the involved hardware and software layers. The Scalable I/O for Extreme Performance (SIOX) project provides a versatile environment for monitoring I/O activities and learning from this information. The goal of SIOX is to automatically suggest and apply performance optimizations, and to assist in locating and diagnosing performance problems. In this poster, we present the current status of SIOX. Our modular architecture covers instrumentation of POSIX, MPI and other high-level I/O libraries; the monitoring data is recorded asynchronously into a global database, and recorded traces can be visualized. Furthermore, we offer a set of primitive plug-ins with additional features to demonstrate the flexibility of our architecture: A surveyor plug-in to keep track of the oberserved spatial access patterns; an fadvise plug-in for injecting hints to achieve read-ahead for strided access patterns; and an optimizer plug-in which monitors the performance achieved with different MPI-IO hints, automatically supplying the best known hint-set when no hints were explicitely set. The presentation of the technical status is accompanied by a demonstration of some of these features on our 20 node cluster. In additional experiments, we analyze the overhead for concurrent access, for MPI-IO's 4-levels of access, and for an instrumented climate application. While our prototype is not yet full-featured, it demonstrates the potential and feasability of our approach.
  • Compression By Default - Reducing Total Cost of Ownership of Storage Systems (Michael Kuhn, Konstantinos Chasapis, Manuel Dolz, Thomas Ludwig), In Supercomputing, Lecture Notes in Computer Science (8488), (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer International Publishing (Berlin, Heidelberg), ISC 2014, Leipzig, Germany, ISBN: 978-3-319-07517-4, ISSN: 0302-9743, 2014-06
    BibTeX DOI
  • Exascale Storage Systems – An Analytical Study of Expenses (Julian Kunkel, Michael Kuhn, Thomas Ludwig), In Supercomputing Frontiers and Innovations, Series: Volume 1, Number 1, pp. 116–134, (Editors: Jack Dongarra, Vladimir Voevodin), Publishing Center of South Ural State University (454080, Lenin prospekt, 76, Chelyabinsk, Russia), 2014-06
    BibTeX URL
  • Whitepaper: E10 – Exascale IO (Andre Brinkmann, Toni Cortes, Hugo Falter, Julian Kunkel, Sai Narasimhamurthy), 2014-06
    BibTeX URL
  • Evaluating Power-Performace Benefits of Data Compression in HPC Storage Servers (Konstantinos Chasapis, Manuel Dolz, Michael Kuhn, Thomas Ludwig), In IARIA Conference, pp. 29–34, (Editors: Steffen Fries, Petre Dini), IARIA XPS Press, ENERGY 2014, Chamonix, France, ISBN: 978-1-61208-332-2, ISSN: 2308-412X, 2014-04-20 – Awards: Best Paper
    BibTeX
    Abstract: Both energy and storage are becoming key issues in high-performance (HPC) systems, especially when thinking about upcoming Exascale systems. The amount of energy consumption and storage capacity needed to solve future problems is growing in a marked curve that the HPC community must face in cost-/energy-efficient ways. In this paper we provide a power-performance evaluation of HPC storage servers that take over tasks other than simply storing the data to disk. We use the Lustre parallel distributed file system with its ZFS back-end, which natively supports compression, to show that data compression can help to alleviate capacity and energy problems. In the first step of our analysis we study different compression algorithms with regards to their CPU and power overhead with a real dataset. Then, we use a modified version of the IOR benchmark to verify our claims for the HPC environment. The results demonstrate that the energy consumption can be reduced by up to 30% in the write phase of the application and 7% for write-intensive applications. At the same time, the required storage capacity can be reduced by approximately 50%. These savings can help in designing more power-efficient and leaner storage systems.
  • PaTriG - Particle Transport Simulation in Grids (Tobias Weinzierl, Philipp Neumann, Kristof Unterweger, Bart Verleye, Roland Wittmann), In High Performance Computing in Science and Engineering Garching/Munich 2014 (Siegfried Wagner, Arndt Bode, Helmut Satzger, Matthias Brehm), pp. 128–129, Bayerische Akademie der Wissenschaften (München), 2014
    BibTeX
  • Genomic and transcriptomic analyses match medulloblastoma mouse models to their human counterparts (Julia Pöschl, Sebastian Stark, Philipp Neumann, Susanne Gröbner, Daisuke Kawauchi, David T.W. Jones, Paul A. Northcott, Peter Lichter, Stefan M. Pfister, Marcel Kool, Ulrich Schüller), In Acta Neuropathologica, Series: 128(1), pp. 123–136, (Editors: Werner Paulus), Springer, ISSN: 0001-6322, 2014
    BibTeX DOI
    Abstract: Medulloblastoma is a malignant embryonal brain tumor with highly variable outcome. In order to study the biology of this tumor and to perform preclinical treatment studies, a lot of effort has been put into the generation of appropriate mouse models. The usage of these models, however, has become debatable with the advances in human medulloblastoma subgrouping. This study brings together multiple relevant mouse models and matches genetic alterations and gene expression data of 140 murine tumors with 423 human medulloblastomas in a global way. Using AGDEX analysis and k-means clustering, we show that the Blbp-cre::Ctnnb1(ex3)Fl/+Trp53Fl/Fl mouse model fits well to human WNT medulloblastoma, and that, among various Myc- or Mycn-based mouse medulloblastomas, tumors in Glt1-tTA::TRE-MYCN/Luc mice proved to be most specific for human group 3 medulloblastoma. None of the analyzed models displayed a significant match to group 4 tumors. Intriguingly, mice with Ptch1 or Smo mutations selectively modeled SHH medulloblastomas of adulthood, although such mutations occur in all human age groups. We therefore suggest that the infantile or adult gene expression pattern of SHH MBs are not solely determined by specific mutations. This is supported by the observation that human medulloblastomas with PTCH1 mutations displayed more similarities to PTCH1 wild-type tumors of the same age group than to PTCH1-mutated tumors of the other age group. Together, we provide novel insights into previously unrecognized specificity of distinct models and suggest these findings as a solid basis to choose the appropriate model for preclinical studies on medulloblastoma.
  • Hybrid molecular-continuum methods: From prototypes to coupling software (Philipp Neumann, Wolfgang Eckhardt, Hans-Joachim Bungartz), In Computers & Mathematics with Applications, Series: 67(2), pp. 272–281, (Editors: Leszek Demkowicz), Elsevier, ISSN: 0898-1221, 2014
    BibTeX DOI
    Abstract: In this contribution, we review software requirements in hybrid molecular-continuum simulations. For this purpose, we analyze a prototype implementation which combines two frameworks-the Molecular Dynamics framework MarDyn and the framework Peano for spatially adaptive mesh-based simulations-and point out particular challenges of a general coupling software. Based on this analysis, we discuss the software design of our recently published coupling tool. We explain details on its overall structure and show how the challenges that arise in respective couplings are resolved by the software.
  • Influence of large offshore wind farms on North German climate (Marita Boettcher, Peter Hoffmann, Hermann Lenhart, Heinke Schlünzen, Robert Schoetter), In Meteorologische Zeitschrift, Series: 24, pp. 465–480, Borntraeger Science Publishers (Stuttgart), ISSN: 0941-2948, 2014
    BibTeX DOI
    Abstract: Wind farms impact the local meteorology by taking up kinetic energy from the wind field and by creating a large wake. The wake influences mean flow, turbulent fluxes and vertical mixing. In the present study, the influences of large offshore wind farms on the local summer climate are investigated by employing the mesoscale numerical model METRAS with and without wind farm scenarios. For this purpose, a parametrisation for wind turbines is implemented in METRAS. Simulations are done for a domain covering the northern part of Germany with focus on the urban summer climate of Hamburg. A statistical-dynamical downscaling is applied using a skill score to determine the required number of days to simulate the climate and the influence of large wind farms situated in the German Bight, about 100 km away from Hamburg.Depending on the weather situation, the impact of large offshore wind farms varies from nearly no influence up to cloud cover changes over land. The decrease in the wind speed is most pronounced in the local areas in and around the wind farms. Inside the wind farms, the sensible heat flux is reduced. This results in cooling of the climate summer mean for a large area in the northern part of Germany. Due to smaller momentum fluxes the latent heat flux is also reduced. Therefore, the specific humidity is lower but because of the cooling, the relative humidity has no clear signal. The changes in temperature and relative humidity are more wide spread than the decrease of wind speed. Hamburg is located in the margins of the influenced region. Even if the influences are small, the urban effects of Hamburg become more relevant than in the present and the off-shore wind farms slightly intensify the summer urban heat island.
  • Rendering of Feature-rich Dynamically Changing Volumetric Datasets on GPU (Martin Schreiber, Atanas Atanasov, Philipp Neumann, Hans-Joachim Bungartz), In 2014 International Conference on Computational Science, Procedia Computer Science (29), pp. 648–658, Elsevier, ICCS 2014, Cairns, 2014
    BibTeX DOI
    Abstract: Interactive photo-realistic representation of dynamic liquid volumes is a challenging task for today's GPUs and state-of-the-art visualization algorithms. Methods of the last two decades consider either static volumetric datasets applying several optimizations for volume casting, or dynamic volumetric datasets with rough approximations to realistic rendering. Nevertheless, accurate real-time visualization of dynamic datasets is crucial in areas of scientific visualization as well as areas demanding for accurate rendering of feature-rich datasets. An accurate and thus realistic visualization of such datasets leads to new challenges: due to restrictions given by computational performance, the datasets may be relatively small compared to the screen resolution, and thus each voxel has to be rendered highly oversampled. With our volumetric datasets based on a real-time lattice Boltzmann fluid simulation creating dynamic cavities and small droplets, existing real-time implementations are not applicable for a realistic surface extraction. This work presents a volume tracing algorithm capable of producing multiple refractions which is also robust to small droplets and cavities. Furthermore we show advantages of our volume tracing algorithm compared to other implementations.
  • Feign: In-Silico Laboratory for Researching I/O Strategies (Jakob Lüttgau, Julian Kunkel), In Parallel Data Storage Workshop (PDSW), 2014 9th, pp. 43–48, SC14, New Orleans, 2014
    BibTeX
  • A Comparison of Trace Compression Methods for Massively Parallel Applications in Context of the SIOX Project (Alvaro Aguilera, Holger Mickler, Julian Kunkel, Michaela Zimmer, Marc Wiedemann, Ralph Müller-Pfefferkorn), In Tools for High Performance Computing 2013, pp. 91–105, ISBN: 978-3-319-08143-4, 2014
    BibTeX
  • Evaluating Lustre's Metadata Server on a Multi-socket Platform (Konstantinos Chasapis, Manuel Dolz, Michael Kuhn, Thomas Ludwig), In Proceedings of the 9th Parallel Data Storage Workshop, PDSW (2014), pp. 13–18, IEEE Press (Piscataway, NJ, USA), SC14, New Orleans, Louisiana, ISBN: 978-1-4799-7025-4, 2014
    BibTeX DOI
    Abstract: With the emergence of multi-core and multi-socket non-uniform memory access (NUMA) platforms in recent years, new software challenges have arisen to use them efficiently. In the field of high performance computing (HPC), parallel programming has always been the key factor to improve applications performance. However, the implications of parallel architectures in the system software has been overlooked until recently. In this work, we examine the implications of such platforms in the performance scalability of the Lustre parallel distributed file system's metadata server (MDS). We run our experiments on a four socket NUMA platform that has 48 cores. We leverage the mdtest benchmark to generate appropriate metadata workloads and include configurations with varying numbers of active cores and mount points. Additionally, we compare Lustre's metadata scalability with the local file systems ext4 and XFS. The results demonstrate that Lustre's metadata performance is limited to a single socket and decreases when more sockets are used. We also observe that the MDS's back-end device is not a limiting factor regarding the performance.
  • The SIOX Architecture – Coupling Automatic Monitoring and Optimization of Parallel I/O (Julian Kunkel, Michaela Zimmer, Nathanael Hübbe, Alvaro Aguilera, Holger Mickler, Xuan Wang, Andrij Chut, Thomas Bönisch, Jakob Lüttgau, Roman Michel, Johann Weging), In Supercomputing, Supercomputing, pp. 245–260, (Editors: Julian Kunkel, Thomas Ludwig, Hans Meuer), Springer International Publishing, ISC'14, ISC events, Leipzig, ISBN: 978-3-319-07517-4, 2014
    BibTeX DOI
    Abstract: Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of the involved hardware and software layers. The Scalable I/O for Extreme Performance (SIOX) project provides a versatile environment for monitoring I/O activities and learning from this information. The goal of SIOX is to automatically suggest and apply performance optimizations, and to assist in locating and diagnosing performance problems. In this paper, we present the current status of SIOX. Our modular architecture covers instrumentation of POSIX, MPI and other high-level I/O libraries; the monitoring data is recorded asynchronously into a global database, and recorded traces can be visualized. Furthermore, we offer a set of primitive plug-ins with additional features to demonstrate the flexibility of our architecture: A surveyor plug-in to keep track of the observed spatial access patterns; an fadvise plug-in for injecting hints to achieve read-ahead for strided access patterns; and an optimizer plug-in which monitors the performance achieved with different MPI-IO hints, automatically supplying the best known hint-set when no hints were explicitly set. The presentation of the technical status is accompanied by a demonstration of some of these features on our 20 node cluster. In additional experiments, we analyze the overhead for concurrent access, for MPI-IO’s 4-levels of access, and for an instrumented climate application. While our prototype is not yet full-featured, it demonstrates the potential and feasibility of our approach.
  • Monitoring Energy Consumption With SIOX Autonomous Monitoring Triggered by Abnormal Energy Consumption (Julian M. Kunkel, Alvaro Aguilera, Nathanael Hübbe, Marc Wiedemann, Michaela Zimmer), pp. 8, Springer, EnA-HPC 2014, Technische Universität Dresden, Dresden, 2014
    BibTeX
    Abstract: In the face of the growing complexity of HPC systems, their growing energy costs, and the increasing difficulty to run applications efficiently, a number of monitoring tools have been developed during the last years. SIOX is one such endeavor, with a uniquely holis- tic approach: Not only does it aim to record a certain kind of data, but to make all relevant data available for analysis and optimization. Among other sources, this encompasses data from hardware energy counters and trace data from different hardware/software layers. However, not all data that can be recorded should be recorded. As such, SIOX needs good heuristics to de- termine when and what data needs to be collected, and the energy consumption can provide an important sig- nal about when the system is in a state that deserves closer attention. In this paper, we show that SIOX can use Likwid to collect and report the energy consump- tion of applications, and present how this data can be visualized using SIOX’s web-interface. Furthermore, we outline how SIOX can use this information to intelli- gently adjust the amount of data it collects, allowing it to reduce the monitoring overhead while still providing complete information about critical situations.

2013

  • ICON DSL: A Domain-Specific Language for climate modeling (Raul Torres, Leonidas Lindarkis, Julian Kunkel, Thomas Ludwig), In WOLFHPC 2013 Third International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, SC13, Denver, 2013-11-18
    BibTeX URL
  • Poster: Source-to-Source Translation for Climate Models (Raul Torres, Leonidas Lindarkis, Julian Kunkel), Leipzig, Germany, International Supercomputing Conference 2013, 2013-06-17
    BibTeX URL
  • A Semantics-Aware I/O Interface for High Performance Computing (Michael Kuhn), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 408–421, (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin, Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743, 2013-06
    BibTeX DOI
  • Using Simulation to Validate Performance of MPI(-IO) Implementations (Julian Kunkel), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 181–195, (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin, Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743, 2013-06
    BibTeX DOI
    Abstract: Parallel file systems and MPI implementations aim to exploit available hardware resources in order to achieve optimal performance. Since performance is influenced by many hardware and software factors, achieving optimal performance is a daunting task. For these reasons, optimized communication and I/O algorithms are still subject to research. While complexity of collective MPI operations is discussed in literature sometimes, theoretic assessment of the measurements is de facto non-existent. Instead, conducted analysis is typically limited to performance comparisons to previous algorithms. However, observable performance is not only determined by the quality of an algorithm. At run-time performance could be degraded due to unexpected implementation issues and triggered hardware and software exceptions. By applying a model that resembles the system, simulation allows us to estimate the performance. With this approach, the non-function requirement for performance of an implementation can be validated and run-time inefficiencies can be localized. In this paper we demonstrate how simulation can be applied to assess observed performance of collective MPI calls and parallel IO. PIOsimHD, an event-driven simulator, is applied to validate observed performance on our 10 node cluster. The simulator replays recorded application activity and point-to-point operations of collective operations. It also offers the option to record trace files for visual comparison to recorded behavior. With the innovative introspection into behavior, several bottlenecks in system and implementation are localized.
  • Evaluating Lossy Compression on Climate Data (Nathanael Hübbe, Al Wegener, Julian Kunkel, Yi Ling, Thomas Ludwig), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 343–356, (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin, Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743, 2013-06
    BibTeX DOI
    Abstract: While the amount of data used by today’s high-performance computing (HPC) codes is huge, HPC users have not broadly adopted data compression techniques, apparently because of a fear that compression will either unacceptably degrade data quality or that compression will be too slow to be worth the effort. In this paper, we examine the effects of three lossy compression methods (GRIB2 encoding, GRIB2 using JPEG 2000 and LZMA, and the commercial Samplify APAX algorithm) on decompressed data quality, compression ratio, and processing time. A careful evaluation of selected lossy and lossless compression methods is conducted, assessing their influence on data quality, storage requirements and performance. The differences between input and decoded datasets are described and compared for the GRIB2 and APAX compression methods. Performance is measured using the compressed file sizes and the time spent on compression and decompression. Test data consists both of 9 synthetic data exposing compression behavior and 123 climate variables output from a climate model. The benefits of lossy compression for HPC systems are described and are related to our findings on data quality.
  • Towards Self-optimization in HPC I/O (Michaela Zimmer, Julian Kunkel, Thomas Ludwig), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 422–434, (Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin, Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743, 2013-06
    BibTeX DOI
    Abstract: Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of internal processes while executing application programs. Unfortunately, there is a lack of monitoring tools to reduce this complexity to a bearable level. For these reasons, the project Scalable I/O for Extreme Performance (SIOX) aims to provide a versatile environment for recording system activities and learning from this information. While still under development, SIOX will ultimately assist in locating and diagnosing performance problems and automatically suggest and apply performance optimizations.The SIOX knowledge path is concerned with the analysis and utilization of data describing the cause-and-effect chain recorded via the monitoring path. In this paper, we present our refined modular design of the knowledge path. This includes a description of logical components and their interfaces, details about extracting, storing and retrieving abstract activity patterns, a concept for tying knowledge to these patterns, and the integration of machine learning. Each of these tasks is illustrated through examples. The feasibility of our design is further demonstrated with an internal component for anomaly detection, permitting intelligent monitoring to limit the SIOX system’s impact on system resources.
  • Performance-optimized clinical IMRT planning on modern CPUs (Peter Ziegenhein, Cornelis Ph Kamerling, Mark Bangert, Julian Kunkel, Uwe Oelfke), In Physics in Medicine and Biology, Series: Volume 58 Number 11, IOP Publishing, ISSN: 1361-6560, 2013-05-08
    BibTeX URL DOI
    Abstract: Intensity modulated treatment plan optimization is a computationally expensive task. The feasibility of advanced applications in intensity modulated radiation therapy as every day treatment planning, frequent re-planning for adaptive radiation therapy and large-scale planning research severely depends on the runtime of the plan optimization implementation. Modern computational systems are built as parallel architectures to yield high performance. The use of GPUs, as one class of parallel systems, has become very popular in the field of medical physics. In contrast we utilize the multi-core central processing unit (CPU), which is the heart of every modern computer and does not have to be purchased additionally. In this work we present an ultra-fast, high precision implementation of the inverse plan optimization problem using a quasi-Newton method on pre-calculated dose influence data sets. We redefined the classical optimization algorithm to achieve a minimal runtime and high scalability on CPUs. Using the proposed methods in this work, a total plan optimization process can be carried out in only a few seconds on a low-cost CPU-based desktop computer at clinical resolution and quality. We have shown that our implementation uses the CPU hardware resources efficiently with runtimes comparable to GPU implementations, at lower costs.
  • Reducing the HPC-Datastorage Footprint with MAFISC – Multidimensional Adaptive Filtering Improved Scientific data Compression (Nathanel Hübbe, Julian Kunkel), In Computer Science - Research and Development, Series: Volume 28, Issue 2-3, pp. 231–239, Springer, 2013-05
    BibTeX URL
    Abstract: Large HPC installations today also include large data storage installations. Data compression can significantly reduce the amount of data, and it was one of our goals to find out, how much compression can do for climate data. The price of compression is, of course, the need for additional computational resources, so our second goal was to relate the savings of compression to the costs it necessitates. In this paper we present the results of our analysis of typical climate data. A lossless algorithm based on these insights is developed and its compression ratio is compared to that of standard compression tools. As it turns out, this algorithm is general enough to be useful for a large class of scientific data, which is the reason we speak of MAFISC as a method for scientific data compression. A numeric problem for lossless compression of scientific data is identified and a possible solution is given. Finally, we discuss the economics of data compression in HPC environments using the example of the German Climate Computing Center.
  • Towards I/O Analysis of HPC Systems and a Generic Architecture to Collect Access Patterns (Marc Wiedemann, Julian Kunkel, Michaela Zimmer, Thomas Ludwig, Michael Resch, Thomas Bönisch, Xuan Wang, Andriy Chut, Alvaro Aguilera, Wolfgang E. Nagel, Michael Kluge, Holger Mickler), In Computer Science - Research and Development, Series: 28, pp. 241–251, Springer New York Inc. (Hamburg, Berlin, Heidelberg), ISSN: 1865-2034, 2013-05
    BibTeX URL
    Abstract: In high-performance computing applications, a high-level I/O call will trigger activities on a multitude of hardware components. These are massively parallel systems supported by huge storage systems and internal software layers. Their complex interplay currently makes it impossible to identify the causes for and the locations of I/O bottlenecks. Existing tools indicate when a bottleneck occurs but provide little guidance in identifying the cause or improving the situation. We have thus initiated Scalable I/O for Extreme Performance to find solutions for this problem. To achieve this goal in SIOX, we will build a system to record access information on all layers and components, to recognize access patterns, and to characterize the I/O system. The system will ultimately be able to recognize the causes of the I/O bottlenecks and propose optimizations for the I/O middleware that can improve I/O performance, such as throughput rate and latency. Furthermore, the SIOX system will be able to support decision making while planning new I/O systems. In this paper, we introduce the SIOX system and describe its current status: We first outline our approach for collecting the required access information. We then provide the architectural concept, the methods for reconstructing the I/O path and an excerpt of the interface for data collection. This paper focuses especially on the architecture, which collects and combines the relevant access information along the I/O path, and which is responsible for the efficient transfer of this information. An abstract modelling approach allows us to better understand the complexity of the analysis of the I/O activities on parallel computing systems, and an abstract interface allows us to adapt the SIOX system to various HPC file systems.
  • Simulating parallel programs on application and system level (Julian Kunkel), In Computer Science - Research and Development, Series: Volume 28, Issue 2-3, pp. 167–174, Springer, 2013-05
    BibTeX URL
    Abstract: Understanding the measured performance of parallel applications in real systems is difficult—with the aim to utilize the resources available, optimizations deployed in hardware and software layers build up to complex systems. However, in order to identify bottlenecks the performance must be assessed. This paper introduces PIOsimHD, an event-driven simulator for MPI-IO applications and the underlying (heterogeneous) cluster computers. With the help of the simulator runs of MPI-IO applications can be conducted in-silico; this includes detailed simulation of collective communication patterns as well as simulation of parallel I/O. The simulation estimates upper bounds for expected performance and helps assessing observed performance. Together with HDTrace, an environment which allows tracing the behavior of MPI programs and internals of MPI and PVFS, PIOsimHD enables us to localize inefficiencies, to conduct research on optimizations for communication algorithms, and to evaluate arbitrary and future systems. In this paper the simulator is introduced and an excerpt of the conducted validation is presented, which demonstrates the accuracy of the models for our cluster.
  • A dynamic mesh refinement technique for Lattice Boltzmann simulations on octree-like grids (Philipp Neumann, Tobias Neckel), In Computational Mechanics, Series: 51(2), pp. 237–253, (Editors: Peter Wriggers), Springer, ISSN: 0178-7675, 2013
    BibTeX DOI
    Abstract: In this contribution, we present our new adaptive Lattice Boltzmann implementation within the Peano framework, with special focus on nanoscale particle transport problems. With the continuum hypothesis not holding anymore on these small scales, new physical effects-such as Brownian fluctuations-need to be incorporated. We explain the overall layout of the application, including memory layout and access, and shortly review the adaptive algorithm. The scheme is validated by different benchmark computations in two and three dimensions. An extension to dynamically changing grids and a spatially adaptive approach to fluctuating hydrodynamics, allowing for the thermalisation of the fluid in particular regions of interest, is proposed. Both dynamic adaptivity and adaptive fluctuating hydrodynamics are validated separately in simulations of particle transport problems. The application of this scheme to an oscillating particle in a nanopore illustrates the importance of Brownian fluctuations in such setups.
  • Massively parallel molecular-continuum simulations with the macro-micro-coupling tool (Philipp Neumann, Jens Harting), In Hybrid Particle-Continuum Methods in Computational Materials Physics, NIC Series (46), pp. 211–216, Forschungszentrum Jülich GmbH, HYBRID 2013, Jülich, ISBN: 978-3-89336-849-5, 2013
    BibTeX
    Abstract: Efficient implementations of hybrid molecular-continuum flow solvers are required to allow for fast and massively parallel simulations of large complex systems. Several coupling strategies have been proposed over the last years for 2D/ 3D, time-dependent/ steady-state or compressible/incompressible scenarios. Despite their different application areas, most of these schemes comprise the same or similar building blocks. Still, to the authors' knowledge, no common implementation of these building blocks is available yet. In this contribution, the Macro-Micro Coupling tool is presented which is meant to support developers in coupling mesh-based methods with molecular dynamics. It is written in C++ and supports two- and threedimensional scenarios. Its design is reviewed, and aspects for massively parallel coupled scenarios are addressed. Afterwards, scaling results are presented for a hybrid simulation which couples a molecular dynamics code to the Lattice Boltzmann application of the Peano framework.
  • A CEP Technology Stack for Situation Recognition on the Gumstix Embedded Controller (Stephan Grimm, Thomas Hubauer, Thomas Runkler, Carlos Pachajoa, Felix Rempe, Marco Seravalli, Philipp Neumann), In GI-Jahrestagung, LNI (220), pp. 1925–1930, GI, ISBN: 978-3-88579-614-5, 2013
    BibTeX
    Abstract: Semantic technologies - especially for symbolic reasoning and complex event processing - are particularly interesting to be employed on embedded controllers for various industrial applications such as diagnostics of technical devices to reason about sensor events. However, these technologies are typically tailored towards com-mon PC infrastructure and are thus not readily available on embedded platforms. In this paper, we present a proof-of-concept implementation of a technology stack for semantic complex event processing on the Gumstix embedded platform and report on first experimental results about memory consumption.
  • A radial distribution function-based open boundary force model for multi-centered molecules (Philipp Neumann, Wolfgang Eckhardt, Hans-Joachim Bungartz), In International Journal of Modern Physics C, Series: 25(6), pp. 1450008, (Editors: Hans J. Herrmann), World Scientific, ISSN: 0129-1831, 2013
    BibTeX DOI
    Abstract: We derive an expression for radial distribution function (RDF)-based open boundary forcing for molecules with multiple interaction sites. Due to the high-dimensionality of the molecule configuration space and missing rotational invariance, a computationally cheap, 1D approximation of the arising integral expressions as in the single-centered case is not possible anymore. We propose a simple, yet accurate model invoking standard molecule- and site-based RDFs to approximate the respective integral equation. The new open boundary force model is validated for ethane in different scenarios and shows very good agreement with data from periodic simulations.
  • Hybrid Multiscale Simulation Approaches for Micro- and Nanoflows (Philipp Neumann), Dr. Hut (Munich), ISBN: 978-3-8439-1178-8, 2013
    BibTeX
    Abstract: The simulation of flows over a wide range of spatial or temporal scales has turned out to be one of the most challenging and important fields in computational fluid dynamics. In order to study flow phenomena whose characteristics evolve on different scales or in the transition regime between the continuum, the statistical or the molecular scale, coupled multiscale methods are required. These hybrid methods represent a compromise between physical accuracy and computational complexity. Examples comprise molecular dynamics–Lattice Boltzmann simulations for nanoflows or hybrid continuum-statistical methods for rarefied gas flows where parts of the respective domains are solved by either coarse- or fine-scale simulation methods. For the development of these scale-coupling algorithms, accurate mathematical and physical models of the scale transition regime are required. Efficient sequential and parallel implementations of the single-scale components are necessary to solve the underlying flow problem in reasonable time. Besides, a well-fitting software environment needs to be chosen for the development of the single-scale solvers. One particular environment is given by Peano, a framework for spatially adaptive mesh-based simulations. Peano already contains a sophisticated Navier-Stokes solver for the study of continuum phenomena. Fine-scale simulation components-such as Lattice Boltzmann or molecular dynamics solvers-and respective coupled simulations, however, have not been integrated in the framework yet. Finally, the simulation software for the coupled multiscale system needs to provide a flexible and modular environment for the further development of new coupling strategies as well as an efficient and parallel treatment of the different coupling steps. In this thesis, a spatially adaptive Lattice Boltzmann scheme is incorporated into Peano and extends the applicability of the framework from the continuum to the statistical scale. A modular development of coupled algorithms is guaranteed via the design principles of Peano. The software is validated in benchmark computations and applied to micro- and nanoflow scenarios such as rarefied gas flows in microreactors or particle transport in nanopores. For the latter, an adaptive mesh refinement technique has been established which allows for the dynamic spatial refinement of particular flow regions. Besides, a new hybrid Lattice Boltzmann-Navier-Stokes method is presented and applied to the particle transport scenarios. In order to go beyond the statistical scale, a coupling tool for massively parallel molecular dynamics-Lattice Boltzmann simulations has been developed. Based on the analysis of existing coupling schemes, it encapsulates all coupling steps in different modules; this reduces the efforts in setting up new coupling schemes to the exchange of one or several available module implementations. To the author’s knowledge, the coupling tool hence provides the first piece of software for molecular dynamics-Lattice Boltzmann simulations with this high level of modularity on the one hand and applicability to massively parallel scenarios on the other hand. The capabilities of the tool are demonstrated in different molecular dynamics-Lattice Boltzmann scenarios.

2012

  • A Study on Data Deduplication in HPC Storage Systems (Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Michael Kuhn, Julian Kunkel, Toni Cortes), In Proceedings of the ACM/IEEE Conference on High Performance Computing (SC), IEEE Computer Society, SC'12, Salt Lake City, USA, 2012-11-10
    BibTeX
  • Simulating parallel programs on application and system level (Julian Kunkel), In Computer Science – Research and Development, Series: Volume 28 Number 2-3, Springer (Berlin, Heidelberg), ISSN: 1865-2042, 2012-06
    BibTeX URL DOI
    Abstract: Understanding the measured performance of parallel applications in real systems is difficult—with the aim to utilize the resources available, optimizations deployed in hardware and software layers build up to complex systems. However, in order to identify bottlenecks the performance must be assessed. This paper introduces PIOsimHD, an event-driven simulator for MPI-IO applications and the underlying (heterogeneous) cluster computers. With the help of the simulator runs of MPI-IO applications can be conducted in-silico; this includes detailed simulation of collective communication patterns as well as simulation of parallel I/O. The simulation estimates upper bounds for expected performance and helps assessing observed performance.Together with HDTrace, an environment which allows tracing the behavior of MPI programs and internals of MPI and PVFS, PIOsimHD enables us to localize inefficiencies, to conduct research on optimizations for communication algorithms, and to evaluate arbitrary and future systems. In this paper the simulator is introduced and an excerpt of the conducted validation is presented, which demonstrates the accuracy of the models for our cluster.
  • AccessAnalysis - A Tool for Measuring the Appropriateness of Access Modifiers in Java Systems (Christian Zoller, Axel Schmolitzky), In Proceedings of the 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pp. 120–125, IEEE Computer Society (Los Alamitos, CA, USA), SCAM 2012, Riva del Garda, Trento, Italy, 2012
    BibTeX DOI
    Abstract: Access modifiers allow Java developers to define package and class interfaces tailored for different groups of clients. According to the principles of information hiding and encapsulation, the accessibility of types, methods, and fields should be as restrictive as possible. However, in programming practice, the potential of the given possibilities seems not always be fully exploited. AccessAnalysis is a plug-in for the Eclipse IDE that measures the usage of access modifiers for types and methods in Java. It calculates two metrics, Inappropriate Generosity with Accessibility of Types (IGAT) and Inappropriate Generosity with Accessibility of Methods (IGAM), which represent the degree of deviation between actual and necessary access modifiers. As an approximation for the necessary access modifier, we introduce the notion of minimal access modifiers. The minimal access modifier is the most restrictive access modifier that allows all existing references to a type or method in the entire source code of a system. AccessAnalysis determines minimal access modifiers by static source code analysis using the build-in Java DOM/AST API of Eclipse.
  • A Coupling Tool for Parallel Molecular Dynamics-Continuum Simulations (Philipp Neumann, Nikola Tchipev), In 2012 11th International Symposium on Parallel and Distributed Computing, pp. 111–118, IEEE, ISPDC 2012, Munich, 2012
    BibTeX DOI
    Abstract: We present a tool for coupling Molecular Dynamics and continuum solvers. It is written in C++ and is meant to support the developers of hybrid molecular-continuum simulations in terms of both realisation of the respective coupling algorithm as well as parallel execution of the hybrid simulation. We describe the implementational concept of the tool and its parallel extensions. We particularly focus on the parallel execution of particle insertions into dense molecular systems and propose a respective parallel algorithm. Our implementations are validated for serial and parallel setups in two and three dimensions.
  • Tool Environments to Measure Power Consumption and Computational Performance (Timo Minartz, Daniel Molka, Julian Kunkel, Michael Knobloch, Michael Kuhn, Thomas Ludwig), In Handbook of Energy-Aware and Green Computing (Ishfaq Ahmad, Sanjay Ranka), Chapters: 31, pp. 709–743, Chapman and Hall/CRC Press Taylor and Francis Group (6000 Broken Sound Parkway NW, Boca Raton, FL 33487), ISBN: 978-1-4398-5040-4, 2012
    BibTeX
  • A Coupled Approach for Fluid Dynamic Problems Using the PDE Framework Peano (Philipp Neumann, Hans-Joachim Bungartz, Miriam Mehl, Tobias Neckel, Tobias Weinzierl), In Communications in Computational Physics, Series: 12 (1), pp. 65–84, (Editors: Xian-Tu He), Global Science Press, ISSN: 1815-2406, 2012
    BibTeX DOI
    Abstract: We couple different flow models, i.e. a finite element solver for the Navier-Stokes equations and a Lattice Boltzmann automaton, using the framework Peano as a common base. The new coupling strategy between the meso- and macroscopic solver is presented and validated in a 2D channel flow scenario. The results are in good agreement with theory and results obtained in similar works by Latt et al. In addition, the test scenarios show an improved stability of the coupled method compared to pure Lattice Boltzmann simulations.
  • Measuring Inappropriate Generosity with Access Modifiers in Java Systems (Christian Zoller, Axel Schmolitzky), In The Joint Conference of the 22nd International Workshop on Software Measurement (IWSM) and the 7th International Conference on Software Process and Product Measurement (Mensura), pp. 43–52, IWSM/MENSURA2012, Assisi, Italy, 2012
    BibTeX DOI
    Abstract: Every element of a software architecture, e.g. a subsystem, package, or class, should have a well-defined interface that exposes or hides its subelements according to the principles of information hiding and encapsulation. Similar to other object-oriented programming languages, Java supports defining interfaces on several levels. The accessibility of types, methods, and fields can be restricted by using access modifiers. With these modifiers, developers are able to define interfaces of packages and classes tailored for different groups of clients. However, in programming practice, types and members seem to be often declared with too generous access modifiers, i.e. they are accessible by more clients than necessary. This can lead to unwanted dependencies and software quality degradation. We developed an approach to measuring the usage of access modifiers for types and methods in Java by defining two new software metrics: Inappropriate Generosity with Accessibility of Types (IGAT) and Inappropriate Generosity with Accessibility of Methods (IGAM). Furthermore, we created a tool called AccessAnalysis that calculates and displays these metrics. Using AccessAnalysis, we conducted a survey on twelve open source Java projects. The results support our assumption that access modifiers are often chosen more generously than necessary. On average, around one third of all type and method access modifiers fall into this category. Especially top-level types are almost always declared as public, so that package interfaces typically expose more types than necessary. Only 2% of all investigated top-level types are encapsulated inside their package.
  • Lattice Boltzmann Simulations in the Slip and Transition Flow Regime with the Peano Framework (Philipp Neumann, Till Rohrmann), In Open Journal of Fluid Dynamics, Series: 2(3), pp. 101–110, (Editors: Heuy-Dong Kim), Scientific Research Publishing, ISSN: 2165-3852, 2012
    BibTeX DOI
    Abstract: We present simulation results of flows in the finite Knudsen range, that is in the slip and transition flow regime. Our implementations are based on the Lattice Boltzmann method and are accomplished within the Peano framework. We validate our code by solving two- and three-dimensional channel flow problems and compare our results with respective experiments from other research groups. We further apply our Lattice Boltzmann solver to the geometrical setup of a microreactor consisting of differently sized channels and a reactor chamber. Here, we apply static adaptive grids to further reduce computational costs. We further investigate the influence of using a simple BGK collision kernel in coarse grid regions which are further away from the slip boundaries. Our results are in good agreement with theory and non-adaptive simulations, demonstrating the validity and the capabilities of our adaptive simulation software for flow problems at finite Knudsen numbers.
  • Report on “Distance to Target” Modelling Assessment by ICG-EMO (Hermann Lenhart, Xavier Desmit, Fabian Große, David Mills, Geneviève Lacroix, Hans Los, Alain Ménesguen, Johannes Pätsch, Tineke Troost, Johan van der Molen, Sonja van Leeuwen, Sarah Wakelin), Reports of OSPAR ICG-EMO Working group, OSPAR Commission (London, United Kingdom), 2012
    BibTeX
  • Scientific Computing: Performance and Efficiency in Climate Models (Sandra Schröder, Michael Kuhn, Nathanael Hübbe, Julian Kunkel, Timo Minartz, Petra Nerge, Florens Wasserfall, Thomas Ludwig), In Proceedings of the Work in Progress Session, 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, SEA-Publications (31), (Editors: Erwin Grosspietsch, Konrad Klöckner), Institute for Systems Engineering and Automation (Johannes Kepler University Linz), PDP 2012, Munich Network Management Team, Garching, Germany, ISBN: 978-3-902457-31-8, 2012
    BibTeX
  • eeClust: Energy-Efficient Cluster Computing (Timo Minartz, Daniel Molka, Michael Knobloch, Stephan Krempel, Thomas Ludwig, Wolfgang E. Nagel, Bernd Mohr, Hugo Falter), In Competence in High Performance Computing 2010, pp. 111–124, Springer Berlin Heidelberg (Heidelberg), CiHPC 2010, Schwetzingen, Germany, ISBN: 978-3-642-24025-6, 2012
    BibTeX DOI
    Abstract: Energy consumption has become a major topic in high performance computing in the last years. This is first due to the high operational costs for large-scale machines which are almost as high as the acquisition costs of the whole installation. A second factor is the high carbon footprint of HPC-centers, which should be reduced for environmental reasons. We present the eeClust project, which aims at the reduction of energy consumption of applications running on a cluster with as little performance degradation as possible. We outline the concept of the project, present the tools involved in analyzing the energy consumption of the application as well as managing hardware power states. Further we present first results and the ongoing work in the project.
  • Optimizations for Two-Phase Collective I/O (Michael Kuhn, Julian Kunkel, Yuichi Tsujita, Hidetaka Muguruma, Thomas Ludwig), In Applications, Tools and Techniques on the Road to Exascale Computing, Advances in Parallel Computing (22), pp. 455–462, (Editors: Koen De Bosschere, Erik H. D'Hollander, Gerhard R. Joubert, David Padua, Frans Peters), IOS Press (Amsterdam, Berlin, Tokyo, Washington DC), ParCo 2011, University of Ghent, ELIS Department, Ghent, Belgium, ISBN: 978-1-61499-040-6, ISSN: 0927-5452, 2012
    BibTeX
    Abstract: The performance of parallel distributed file systems suffers from many clients executing a large number of operations in parallel, because the I/O subsystem can be easily overwhelmed by the sheer amount of incoming I/O operations. This, in turn, can slow down the whole distributed system. Many optimizations exist that try to alleviate this problem. Client-side optimizations perform preprocessing to minimize the amount of work the file servers have to do. Server-side optimizations use server-internal knowledge to improve performance. This paper provides an overview of existing client-side optimizations and presents new modifications of the Two-Phase protocol. Interleaved Two-Phase is a modification of ROMIO's Two-Phase protocol, which iterates over the file differently to reduce the number of seek operations on disk. Pipelined Two-Phase uses a pipelined scheme which overlaps I/O and communication phases to utilize the network and I/O subsystems concurrently.
  • IOPm – Modeling the I/O Path with a Functional Representation of Parallel File System and Hardware Architecture (Julian Kunkel, Thomas Ludwig), In 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 554–561, (Editors: Rainer Stotzka, Michael Schiffers, Yiannis Cotronis), IEEE Computer Society (Los Alamitos, Washington, Tokyo), PDP 2012, Munich Network Management Team, Garching, Germany, ISBN: 978-0-7695-4633-9, ISSN: 1066-6192, 2012
    BibTeX
    Abstract: The I/O path model (IOPm) is a graphical representation of the architecture of parallel file systems and the machine they are deployed on. With help of IOPm, file system and machine configurations can be quickly analyzed and distinguished from each other. Contrary to typical representations of the machine and file system architecture, the model visualizes the data or meta data path of client access. Abstract functionality of hardware components such as client and server nodes is covered as well as software aspects such as high-level I/O libraries, collective I/O and caches. Redundancy could be represented, too. Besides the advantage of a standardized representation for analysis IOPm assists to identify and communicate bottlenecks in the machine and file system configuration by highlighting performance relevant functionalities. By abstracting functionalities from the components they are hosted on, IOPm will enable to build interfaces to monitor file system activity.
  • Simulation-Aided Performance Evaluation of Server-Side Input/Output Optimizations (Michael Kuhn, Julian Kunkel, Thomas Ludwig), In 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 562–566, (Editors: Rainer Stotzka, Michael Schiffers, Yiannis Cotronis), IEEE Computer Society (Los Alamitos, Washington, Tokyo), PDP 2012, Munich Network Management Team, Garching, Germany, ISBN: 978-0-7695-4633-9, ISSN: 1066-6192, 2012
    BibTeX
    Abstract: The performance of parallel distributed file systems suffers from many clients executing a large number of operations in parallel, because the I/O subsystem can be easily overwhelmed by the sheer amount of incoming I/O operations. Many optimizations exist that try to alleviate this problem. Client-side optimizations perform preprocessing to minimize the amount of work the file servers have to do. Server-side optimizations use server-internal knowledge to improve performance. The HDTrace framework contains components to simulate, trace and visualize applications. It is used as a testbed to evaluate optimizations that could later be implemented in real-life projects. This paper compares existing client-side optimizations and newly implemented server-side optimizations and evaluates their usefulness for I/O patterns commonly found in HPC. Server-directed I/O chooses the order of non-contiguous I/O operations and tries to aggregate as many operations as possible to decrease the load on the I/O subsystem and improve overall performance. The results show that server-side optimizations beat client-side optimizations in terms of performance for many use cases. Integrating such optimizations into parallel distributed file systems could alleviate the need for sophisticated client-side optimizations. Due to their additional knowledge of internal workflows server-side optimizations may be better suited to provide high performance in general.
  • Simulating Application and System Interaction with PIOsimHD (Julian Kunkel, Thomas Ludwig), In Proceedings of the Work in Progress Session, 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, SEA-Publications (31), (Editors: Erwin Grosspietsch, Konrad Klöckner), Institute for Systems Engineering and Automation (Johannes Kepler University Linz), PDP 2012, Munich Network Management Team, Garching, Germany, ISBN: 978-3-902457-31-8, 2012
    BibTeX
  • Evaluating the Influence of File System Interfaces and Semantics on I/O Throughput in High Performance Computing (Christina Janssen, Michael Kuhn, Thomas Ludwig), In Proceedings of the Work in Progress Session, 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, SEA-Publications (31), (Editors: Erwin Grosspietsch, Konrad Klöckner), Institute for Systems Engineering and Automation (Johannes Kepler University Linz), PDP 2012, Munich Network Management Team, Garching, Germany, ISBN: 978-3-902457-31-8, 2012
    BibTeX
  • Visualization of MPI(-IO) Datatypes (Julian Kunkel, Thomas Ludwig), In Applications, Tools and Techniques on the Road to Exascale Computing, Advances in Parallel Computing (22), pp. 473–480, (Editors: Koen De Bosschere, Erik H. D'Hollander, Gerhard R. Joubert, David Padua, Frans Peters), IOS Press (Amsterdam, Berlin, Tokyo, Washington DC), ParCo 2011, University of Ghent, ELIS Department, Ghent, Belgium, ISBN: 978-1-61499-040-6, ISSN: 0927-5452, 2012
    BibTeX
    Abstract: To permit easy and efficient access to non-contiguous regions in memory for communication and I/O the message passing interface offers nested datatypes. Since nested datatypes can be very complicated, the understanding of non-contiguous access patterns and the debugging of wrongly accessed memory regions is hard for the developer. HDTrace is an environment which allows to trace the behavior of MPI programs and to simulate them for arbitrary
    virtual cluster configuration. It is designed to record all MPI parameters including MPI datatypes. In this paper we present the capabilities to visualize usage of derived datatypes for communication and I/O accesses – a simple hierarchical view is introduced which presents them in a compact form and allows to dig into the nested datatypes. File regions accessed in non-contiguous I/O calls can be visualized in terms of the original datatype. The presented feature assists developers in understanding the datatype layout and spatial I/O access patterns of their application.
  • Reducing the HPC-Datastorage Footprint with MAFISC - Multidimensional Adaptive Filtering Improved Scientific data Compression (Nathanael Hübbe, Julian Kunkel), In Computer Science - Research and Development, Springer (Hamburg, Berlin, Heidelberg), ISC 2012, Executive Committee, CCH–Congress Center Hamburg, Germany, 2012
    BibTeX DOI
    Abstract: Large HPC installations today also include large data storage installations. Data compression can significantly reduce the amount of data, and it was one of our goals to find out, how much compression can do for climate data. The price of compression is, of course, the need for additional computational resources, so our second goal was to relate the savings of compression to the costs it necessitates.
    In this paper we present the results of our analysis of typical climate data. A lossless algorithm based on these insights is developed and its compression ratio is compared to that of standard compression tools. As it turns out, this algorithm is general enough to be useful for a large class of scientific data, which is the reason we speak of MAFISC as a method for scientific data compression. A numeric problem for lossless compression of scientific data is identified and a possible solution is given. Finally, we discuss the economics of data compression in HPC environments using the example of the German Climate Computing Center.
  • Tracing and Visualization of Energy-Related Metrics (Timo Minartz, Julian M. Kunkel, Thomas Ludwig), In 26th IEEE International Parallel & Distributed Processing Symposium Workshops, IEEE Computer Society, HPPAC 2012, Shanghai, China, 2012
    BibTeX
    Abstract: In an effort to reduce the energy consumption of high-performance computing centers, a number of new approaches have been developed in the last few years. One of these approaches is to switch hardware to lower power states in phases of device idleness or low utilization. Even if the concepts are already quite clear, tools to identify these phases in applications and to determine impact on performance and power consumption are still missing. In this paper, we investigate the tracing of energy-related metrics into our existing tracing environment in an effort to correlate them with the application. We implement tracing of performance and sleep states of the processor, the disk and the network device states in addition to the node power consumption. The exemplary energy efficiency analysis visually correlates the application with the energy-related metrics. With this correlation, it is possible to identify and further avoid waiting times caused by mode switches initiated by the user or the system.

2011

  • Poster: eeClust - Energy-Efficient Cluster Computing (Michael Knobloch, Timo Minartz, Daniel Molka, Stephan Krempel, Thomas Ludwig, Bernd Mohr), Seattle, USA, Supercomputing Conference, 2011-11-15
    BibTeX URL
  • HDTrace – A Tracing and Simulation Environment of Application and System Interaction (Julian Kunkel), Research Papers (2), Research Group: Scientific Computing, University of Hamburg (Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg), 2011-01-23
    BibTeX Publication
    Abstract: HDTrace is an environment which allows to trace and simulate the behavior of MPI programs on a cluster. It explicitly includes support to trace internals of MPICH2 and the parallel file system PVFS. With this support it enables to localize inefficiencies, to conduct research on new algorithms and to evaluate future systems. Simulation provides upper bounds of expected performance and helps to assess observed performance as potential performance gains of optimizations can be approximated.
    In this paper the environment is introduced and several examples depict how it assists to reveal internal behavior and spot bottlenecks. In an example with PVFS the inefficient write-out of a matrix diagonal could be either identified by inspecting the PVFS server behavior or by simulation. Additionally the simulation showed that in theory the operation should finish 20 times faster on our cluster – by applying correct MPI hints this potential could be exploited.
  • Performance Characteristics of Global High-Resolution Ocean (MPIOM) and Atmosphere (ECHAM6) Models on Large-Scale Multicore Cluster (Panagiotis Adamidis, Irina Fast, Thomas Ludwig), In Parallel Computing Technologies - 11th International Conference, PaCT 2011, Kazan, Russia, September 19-23, 2011. Proceedings, Lecture Notes in Computer Science (6873), pp. 390–403, (Editors: Victor Malyshkin), Springer, PaCT, Kazan, Russia, ISBN: 978-3-642-23177-3, 2011
    BibTeX URL DOI
    Abstract: Providing reliable estimates of possible anthropogenic climate change is the subject of considerable scientific effort in the climate modeling community. Climate model simulations are computationally very intensive and the necessary computing capabilities can be provided by supercomputers only. Although modern high performance computer platforms can deliver a peak performance in the Petaflop/s range, most of the existing Earth System Models (ESMs) are unable to exploit this power. The main bottlenecks are the single core code performance, the communication overhead, non-parallel code sections, in particular serial I/O, and the static and dynamic load imbalance between model partitions. The pure scalability of ESMs on massively parallel systems has become a major problem in recent years. In this study we present results from the performance and scalability analysis of the high-resolution ocean model MPIOM and the atmosphere model ECHAM6 on the largescale multicore cluster ”Blizzard” located at the German Climate Computing Center (DKRZ). The issues outlined here are common to many currently existing ESMs running on massively parallel computer platforms with distributed memory.
  • Determine Energy-Saving Potential in Wait-States of Large-Scale Parallel Programs (Michael Knobloch, Bernd Mohr, Timo Minartz), In Computer Science - Research and Development, Series: 1, (Editors: Thomas Ludwig), Springer (Berlin / Heidelberg, Germany), 2011
    BibTeX DOI
    Abstract: Energy consumption is one of the major topics in high performance computing (HPC) in the last years. However, little effort is put into energy analysis by developers of HPC applications. We present our approach of combined performance and energy analysis using the performance analysis tool-set Scalasca. Scalascas parallel wait-state analysis is extended by a calculation of the energy-saving potential if a lower power-state can be used.
  • Free-Surface Lattice Boltzmann Simulation on Many-Core Architectures (Martin Schreiber, Philipp Neumann, Stefan Zimmer, Hans-Joachim Bungartz), In 2011 International Conference on Computational Science, Procedia Computer Science (4), pp. 984–993, Elsevier, ICCS 2011, Singapore, 2011
    BibTeX DOI
    Abstract: Current advances in many-core technologies demand simulation algorithms suited for the corresponding architectures while with regard to the respective increase of computational power, real-time and interactive simulations become possible and desirable. We present an OpenCL implementation of a Lattice-Boltzmann-based free-surface solver for GPU architectures. The massively parallel execution especially requires special techniques to keep the interface region consistent, which is here addressed by a novel multipass method. We further compare different memory layouts according to their performance for both a basic driven cavity implementation and the free-surface method, pointing out the capabilities of our implementation in real-time and interactive scenarios, and shortly present visualizations of the flow, obtained in real-time.
  • Navier-Stokes and Lattice Boltzmann on octree-like grids in the Peano framework (Miriam Mehl, Tobias Neckel, Philipp Neumann), In International Journal for Numerical Methods in Fluids, Series: 65(1-3), pp. 67–86, (Editors: Remi Abgrall), John Wiley & Sons, ISSN: 0271-2091, 2011
    BibTeX DOI
    Abstract: The Navier-Stokes equations (NS) and the Lattice-Boltzmann method (LBM) are the main two types of models used in computational fluid dynamics yielding similar results for certain classes of flow problems both having certain advantages and disadvantages. We present the realization of both approaches-laminar incompressible NS and LB-on the same code basis, our PDE framework Peano. Peano offers a highly memory efficient implementation of all grid and data handling issues for structured adaptive grids. Using a common code basis allows to compare NS and LB without distorting the results by differences in the maturity and degree of hardware-optimality of thetechnical implementation. In addition to a comparison for some test examples, we briefly present the coupling of NS and LB in one single simulation. Such a coupling might be useful to simulate boundary effects happening on a smaller scale more accurately with LB but still using NS in the main part of the domain.
  • Flexible Workload Generation for HPC Cluster Efficiency Benchmarking (Daniel Molka, Daniel Hackenberg, Robert Schöne, Timo Minartz, Wolfgang E. Nagel), In Computer Science - Research and Development, Series: 1, (Editors: Thomas Ludwig), Springer (Berlin / Heidelberg, Germany), 2011
    BibTeX DOI
    Abstract: The High Performance Computing (HPC) community is well-accustomed to the general idea of benchmarking. In particular, the TOP500 ranking as well as its foundation - the Linpack benchmark - have shaped the field since the early 1990s. Other benchmarks with a larger workload variety such as SPEC MPI2007 are also well-accepted and often used to compare and rate a system's computational capability. However, in a petascale and soon-to-be exascale computing environment, the power consumption of HPC systems and consequently their energy efficiency have been and continue to be of growing importance, often outrivaling all aspects that focus narrowly on raw compute performance. The Green500 list is the first major attempt to rank the energy efficiency of HPC systems. However, its main weakness is again the focus on a single, highly compute bound algorithm. Moreover, its method of extrapolating a system's power consumption from a single node is inherently error-prone. So far, no benchmark is available that has been developed from ground up with the explicit focus on measuring the energy efficiency of HPC clusters. We therefore introduce such a benchmark that includes transparent energy measurements with professional power analyzers. Our efforts are based on well-established standards (C, POSIX-IO and MPI) to ensure a broad applicability. Our well-defined and comprehensible workloads can be used to e.g. compare the efficiency of HPC systems or to track the effects of power saving mechanisms that can hardly be understood by running regular applications due to their overwhelming complexity.
  • Managing Hardware Power Saving Modes for High Performance Computing (Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr), In Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1–8, IGCC, Orlando, Florida, USA, ISBN: 978-1-4577-1222-7, 2011
    BibTeX DOI
    Abstract: Energy consumption has become a major topic in high performance computing in the last years. We present an approach to efficiently manage the power states of an power-aware cluster, including the processor, the network cards and the disks. To profit from the lower power consumption of these states we followed the approach to transfer application knowledge (e.g. future hardware use) to a daemon which efficiently manages the hardware device states per cluster node. After introducing our measurement environment we evaluated the general power saving potential of our AMD and Intel computing nodes. Two example high performance applications are showcases for an initial instrumentation which results in a reduction of the Energy-to-Solution between 4 and 8% with slight increases of the Time-to-Solution.
  • Zwei Metriken zum Messen des Umgangs mit Zugriffsmodifikatoren in Java (Christian Zoller, Axel Schmolitzky), In Software Engineering 2011 – Proceedings, Lecture Notes in Informatics (P-183), pp. 183–194, Gesellschaft für Informatik (Bonn, Germany), SE2011, Karlsruhe, Germany, 2011
    BibTeX URL
    Abstract: Wie viele objektorientierte Programmiersprachen bietet Java die Möglichkeit, über Modifikatoren die Zugreifbarkeit von Typen, Methoden und Feldern in mehreren Stufen einzuschränken. So können für unterschiedliche Gruppen von Klienten differenzierte Schnittstellen definiert werden. Es zeigt sich jedoch, dass in der Praxis die gebotenen Möglichkeiten nicht voll ausgeschöpft werden. Wir beschreiben zwei neue Metriken, mit denen sich der angemessene Umgang mit Zugriffsmodifikatoren in Java messen lässt, sowie ein Werkzeug, das diese Metriken berechnet und beim Einschränken von Schnittstellen hilfreich sein kann. Wir haben unseren Ansatz in zwei kommerziellen Projekten und zwölf Open-Source-Projekten erprobt. Dabei wurde deutlich, dass Zugriffsmodifikatoren oft großzügiger gewählt werden als notwendig.
  • Investigating Influence of Data Storage Organization on Structured Code Search Performance (Daniel Bernau, Olga Mordvinova, Jan Karstens, Susan Hickl), In Computing, Control and Industrial Engineering (CCIE), 2011 IEEE 2nd International Conference on, pp. 247–250, IEEE Computer Society (Washington, DC, USA), CCIE 2011, ISBN: 978-1-4244-9599-3, 2011
    BibTeX URL DOI
  • Validation tool for analyzing vertical profile data of one state variable (valpro1var) by comparison to merged observational data sets and their preparation (OBS_prep): description and user guide (Fabian Große, Andreas Moll), Technical report (11(1)), University of Hamburg, Institute of Oceanography (Hamburg, Germany), 2011
    BibTeX
  • Towards an Energy-Aware Scientific I/O Interface – Stretching the ADIOS Interface to Foster Performance Analysis and Energy Awareness (Julian Kunkel, Timo Minartz, Michael Kuhn, Thomas Ludwig), In Computer Science - Research and Development, Series: 1, (Editors: Thomas Ludwig), Springer (Berlin / Heidelberg, Germany), 2011
    BibTeX DOI
    Abstract: Intelligently switching energy saving modes of CPUs, NICs and disks is mandatory to reduce the energy consumption. Hardware and operating system have a limited perspective of future performance demands, thus automatic control is suboptimal. However, it is tedious for a developer to control the hardware by himself. In this paper we propose an extension of an existing I/O interface which on the one hand is easy to use and on the other hand could steer energy saving modes more efficiently. Furthermore, the proposed modifications are beneficial for performance analysis and provide even more information to the I/O library to improve performance. When a user annotates the program with the proposed interface, I/O, communication and computation phases are labeled by the developer. Run-time behavior is then characterized for each phase, this knowledge could be then exploited by the new library.

2010

  • System Performance Comparison of Stencil Operations with the Convey HC-1 (Julian Kunkel, Petra Nerge), Technical Reports (1), Research Group: Scientific Computing, University of Hamburg (Deutsches Klimarechenzentrum GmbH, Bundesstraße 45a, D-20146 Hamburg), 2010-11-16
    BibTeX URL
    Abstract: In this technical report our first experiences with a Convey HC-1 are documented. Several stencil application kernels are evaluated and related work in the area of CPUs, GPUs and FPGAs is discussed. Performance of the C and Fortran stencil benchmarks in single and double precision are reported. Benchmarks were run on Blizzard – the IBM supercomputer at DKRZ –, the working group's Intel Westmere cluster and the Convey HC-1 provided at KIT.
    With the Vector personality, performance of the Convey system is not convincing. However, there lies potential in programming custom personalities. The major issue is to approximate performance of an implementation on a FPGA before the time consuming implementation is performed.
  • Classification of Network Computers Based on Distribution of ICMP-echo Round-trip Times (Julian Kunkel, Jan C. Neddermeyer, Thomas Ludwig), Research Papers (1), Staats- und Universitätsbibliothek Hamburg (Carl von Ossietzky, Von-Melle-Park 3, 20146 Hamburg), 2010-09-28
    BibTeX URL
    Abstract: Classification of network hosts into groups of similar hosts allows an attacker to transfer knowledge gathered from one host of a group to others. In this paper we demonstrate that it is possible to classify hosts by inspecting the distributions of the response times from ICMP echo requests. In particular, it is shown that the response time of a host is like a fingerprint covering components inside the network, the host software as well as some hardware aspects of the target.
    This allows to identify nodes consisting of similar hardware and OS. Instances of virtual machines hosted on a single physical hardware can be detected in the same way. To understand the influence of hardware and software components a simple model is built and the quantitative contribution of each component to the round-trip time is briefly evaluated.
    Several experiments show the successful application of the classifier inside an Ethernet LAN and over the Internet.
  • From experimental setup to bioinformatics: an RNAi screening platform to identify host factors involved in HIV-1 replication (Kathleen Börner, Johannes Hermle, Christoph Sommer, Nigel P. Brown, Bettina Knapp, Bärbel Glass, Julian Kunkel, Gloria Torralba, Jürgen Reymann, Nina Beil, Jürgen Beneke, Rainer Pepperkok, Reinhard Schneider, Thomas Ludwig, Michael Hausmann, Fred Hamprecht, Holger Erfle, Lars Kaderali, Hans-Georg Kräusslich, Maik J. Lehmann), In Biotechnology Journal, Series: 5-1, pp. 39–49, WILEY-VCH (Weinheim, Germany), ISSN: 1860-7314, 2010-01
    BibTeX URL DOI
    Abstract: RNA interference (RNAi) has emerged as a powerful technique for studying loss of function phenotypes by specific down-regulation of gene expression, allowing the investigation of virus-host interactions by large scale high-throughput RNAi screens. Here we comprehensively describe a robust and sensitive siRNA screening platform consisting of an experimental setup, single-cell image analysis and statistical as well as bioinformatics analyses. The workflow has been established to elucidate host gene functions exploited by viruses, monitoring both suppression and enhancement of viral replication simultaneously by fluorescence microscopy. The platform comprises a two-stage procedure in which potential host-factors were first identified in a primary screen and afterwards retested in a validation screen to confirm true positive hits. Subsequent bioinformatics analysis allows the identification of cellular genes participating in metabolic pathways and cellular networks utilized by viruses for efficient infection. Our workflow has been used to investigate host factor usage by the human immunodeficiency virus-1 (HIV 1) but can also be adapted to different viruses. Importantly, the provided platform can be used to guide further screening approaches, thus contributing to fill in current gaps in our understanding of virus-host interactions.
  • Predicting the consequences of nutrient reduction on the eutrophication status of the North Sea (Hermann Lenhart, David K. Mills, Hanneke Baretta-Bekker, Sonja M. van Leeuwen, Johan van der Molen, Job W. Barettad, Meinte Blaase, Xavier Desmite, Wilfried Kühna, Geneviève Lacroixf, Hans J. Lose, Alain Ménesgueng, Ramiro Nevesh, Roger Proctori, Piet Ruardijj, Morten D. Skogenk, Alice Vanhoutte-Brunierg, Monique T. Villarse, Sarah L. Wakelini), In Journal of Marine Systems, Series: 81 (1-2), pp. 148–170, Elsevier B.V (Amsterdam, Netherlands), ISSN: 0924-7963, 2010
    BibTeX DOI
    Abstract: In this paper the results from a workshop of the OSPAR Intersessional Correspondence Group on Eutrophication Modelling (ICG-EMO) held in Lowestoft in 2007 are presented. The aim of the workshop was to compare the results of a number of North Sea ecosystem models under different reduction scenarios. In order to achieve comparability of model results the participants were requested to use a minimum spin-up time, common boundary conditions which were derived from a wider-domain model, and a set of common forcing data, with special emphasis on a complete coverage of river nutrient loads. Based on the OSPAR requirements river loads were derived, taking into account the reductions already achieved between 1985 and 2002 for each country. First, for the year 2002, for which the Comprehensive Procedure was applied, the different horizontal distributions of net primary production are compared. Furthermore, the differences in the net primary production between the hindcast run and the 50% nutrient reduction runs are displayed. In order to compare local results, the hindcast and reduction runs are presented for selected target areas and scored against the Comprehensive Procedure assessment levels for the parameters DIN, DIP and chlorophyll. Finally, the temporal development of the assessment parameter bottom oxygen concentration from several models is compared with data from the Dutch monitoring station Terschelling 135. The conclusion from the workshop was that models are useful to support the application of the OSPAR Comprehensive Procedure. The comparative exercise formulated specifically for the workshop required models to be evaluated for pre-defined target areas previously classified as problem areas according to the first application of the Comprehensive Procedure. The responsiveness of the modelled assessment parameters varied between different models but in general the parameter showed a larger response in coastal rather than in offshore waters, which in some cases lead to the goal to achieve a non-problem status. Therefore, the application of the Comprehensive Procedure on model results for parameter assessment opens a new potential in testing eutrophication reduction measures within the North Sea catchment. As a result of the workshop further work was proposed to confirm and bolster confidence in the results. One general field of difficulty appeared to be the model forcing with SPM data in order to achieve realistic levels of light attenuation. Finally, effects of the prescribed spin-up procedure are compared against a long-term run over many years and consequences on the resulting initial nutrient concentrations are highlighted
  • Wake Effects (Petra Nerge, Hermann Lenhart), In Analyzing Coastal and Marine Changes: Offshore Wind Farming as a Case Study. Zukunft Küste - Coastal Futures Synthesis Report. (Marcus Lange, Benjamin Burkhard, Stefan Garthe, Kira Gee, Andreas Kannen, Hermann Lenhart, Wilhelm Windhorst), Series: No. 36, Chapters: 5.3, pp. 68–73, LOICZ Research and Studies (GKSS Research Center, Geesthacht, Germany), 2010
    BibTeX URL
  • Collecting Energy Consumption of Scientific Data (Julian Kunkel, Olga Mordvinova, Michael Kuhn, Thomas Ludwig), In Computer Science - Research and Development, Series: 3, pp. 1–9, (Editors: Thomas Ludwig), Springer (Berlin / Heidelberg, Germany), ISSN: 1865-2034, 2010
    BibTeX URL DOI
    Abstract: In this paper the data life cycle management is extended by accounting for energy consumption during the life cycle of files. Information about the energy consumption of data not only allows to account for the correct costs of its life cycle, but also provides a feedback to the user and administrator, and improves awareness of the energy consumption of file I/O. Ideas to realize a storage landscape which determines the energy consumption for maintaining and accessing each file are discussed. We propose to add new extended attributes to file metadata which enable to compute the energy consumed during the life cycle of each file.
  • I/O Performance Evaluation with Parabench – Programmable I/O Benchmark (Olga Mordvinova, Dennis Runz, Julian Kunkel, Thomas Ludwig), In Procedia Computer Science, Series: 1-1, pp. 2119–2128, Elsevier B.V (Amsterdam, Netherlands), ISSN: 1877-0509, 2010
    BibTeX URL DOI
    Abstract: Choosing an appropriate cluster file system for a specific high performance computing application is challenging and depends mainly on the specific application I/O needs. There is a wide variety of I/O requirements: Some implementations require reading and writing large datasets, others out-of-core data access, or they have database access requirements. Application access patterns reflect different I/O behavior and can be used for performance testing. This paper presents the programmable I/O benchmarking tool Parabench. It has access patterns as input, which can be adapted to mimic behavior for a rich set of applications. Using this benchmarking tool, composed patterns can be automatically tested and easily compared on different local and cluster file systems. Here we introduce the design of the proposed benchmark, focusing on the Parabench programming language, which was developed for flexible pattern creation. We also demonstrate here an exemplary usage of Parabench and its capabilities to handle the POSIX and MPI-IO interfaces.
  • I/O Benchmarking of Data Intensive Applications (Olga Mordvinova, Thomas Ludwig, Christian Bartholomä), In Problems in Programming, Series: 2-3, pp. 107–115, National Academy of Sciences of Ukraine, ISSN: 1727-4907, 2010
    BibTeX
  • Simulation of power consumption of energy efficient cluster hardware (Timo Minartz, Julian Kunkel, Thomas Ludwig), In Computer Science - Research and Development, Series: 3, pp. 165–175, (Editors: Thomas Ludwig), Springer (Berlin / Heidelberg, Germany), ISSN: 1865-2034, 2010
    BibTeX URL DOI
    Abstract: In recent years the power consumption of high-performance computing clusters has become a growing problem because the number and size of cluster installations has been rising. The high power consumption of clusters is a consequence of their design goal: High performance. With low utilization, cluster hardware consumes nearly as much energy as when it is fully utilized. Theoretically, in these low utilization phases cluster hardware can be turned off or switched to a lower power consuming state. We designed a model to estimate power consumption of hardware based on the utilization. Applications are instrumented to create utilization trace files for a simulator realizing this model. Different hardware components can be simulated using multiple estimation strategies. An optimal strategy determines an upper bound of energy savings for existing hardware without affecting the time-to-solution. Additionally, the simulator can estimate the power consumption of efficient hardware which is energy-proportional. This way the minimum power consumption can be determined for a given application. Naturally, this minimal power consumption provides an upper bound for any power saving strategy. After evaluating the correctness of the simulator several different strategies and energy-proportional hardware are compared.
  • Modellbasierte Bewertung der Auswirkungen von Offshore-Windkraftanlagen auf die ökologische Integrität der Nordsee (Benjamin Burkhard, Silvia Opitz, Hermann Lenhart, Kai Ahrendt, Stefan Garthe, Bettina Mendel, Petra Nerge, Wilhelm Windhorst), In Forschung für ein integriertes Küstenzonenmanagement: Fallbeispiele Odermündungsregion und Offshore-Windkraft in der Nordsee (A. Kannen, G. Schernewski, I. Krämer, M. Lange, H. Janßen, N. Stybel), Series: 15 (2010), pp. 15–29, Coastline Reports (EUCC - Die Küsten Union Deutschland e. V., c/o Leibniz-Institut für Ostseeforschung Warnemünde, Seestr. 15, 18119 Rostock, Germany), ISBN: 978-3-9811839-7-9, 2010
    BibTeX URL
  • Tracing Performance of MPI-I/O with PVFS2: A Case Study of Optimization (Yuichi Tsujita, Julian Kunkel, Stephan Krempel, Thomas Ludwig), In Parallel Computing: From Multicores and GPU's to Petascale, pp. 379–386, IOS Press, PARCO 2009, ISBN: 978-1-60750-530-3, 2010
    BibTeX URL DOI

2009

  • Tracing Internal Communication in MPI and MPI-I/O (Julian Kunkel, Yuichi Tsujita, Olga Mordvinova, Thomas Ludwig), In International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT, pp. 280–286, IEEE Computer Society (Washington, DC, USA), PDCAT-09, Hiroshima University, Higashi Hiroshima, Japan, ISBN: 978-0-7695-3914-0, 2009-12-29
    BibTeX DOI
    Abstract: MPI implementations can realize MPI operations with any algorithm that fulfills the specified semantics. To provide optimal efficiency the MPI implementation might choose the algorithm dynamically, depending on the parameters given to the function call. However, this selection is not transparent to the user. While this abstraction is appropriate for common users, achieving best performance with fixed parameter sets requires knowledge of internal processing. Also, for developers of collective operations it might be useful to understand timing issues inside the communication or I/O call. In this paper we extended the PIOviz environment to trace MPI internal communication. Thus, this allows the user to see PVFS server behavior together with the behavior in the MPI application and inside MPI itself. We present some analysis results for these capabilities for MPICH2 on a Beowulf Cluster
  • USB Flash Drives as an Energy Efficiency Storage Alternative (Olga Mordvinova, Julian Kunkel, Christian Baun, Thomas Ludwig, Marcel Kunze), In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing, pp. 175–182, IEEE Computer Society (Washington, DC, USA), GRID-09, IEEE/ACM, Banff, Alberta, Canada, ISBN: 978-1-4244-5148-7, 2009-10
    BibTeX DOI
  • Dynamic file system semantics to enable metadata optimizations in PVFS (Michael Kuhn, Julian Kunkel, Thomas Ludwig), In Concurrency and Computation: Practice and Experience, Series: 21-14, pp. 1775–1788, John Wiley and Sons Ltd. (Chichester, UK), ISSN: 1532-0626, 2009
    BibTeX URL DOI
    Abstract: Modern file systems maintain extensive metadata about stored files. While metadata typically is useful, there are situations when the additional overhead of such a design becomes a problem in terms of performance. This is especially true for parallel and cluster file systems, where every metadata operation is even more expensive due to their architecture. In this paper several changes made to the parallel cluster file system Parallel Virtual File System (PVFS) are presented. The changes target at the optimization of workloads with large numbers of small files. To improve the metadata performance, PVFS was modified such that unnecessary metadata is not managed anymore. Several tests with a large quantity of files were performed to measure the benefits of these changes. The tests have shown that common file system operations can be sped up by a factor of two even with relatively few changes.
  • Ecosystem based modeling and indication of ecological integrity in the German North Sea - Case study offshore wind parks (Benjamin Burkhard, Silvia Opitz, Hermann Lenhart, Kai Ahrendt, Stefan Garthe, Bettina Mendel, Wilhelm Windhorst), In Ecological Indicators, Series: 11-1, pp. 168–174, Elsevier B.V (Amsterdam, Netherlands), ISSN: 1470-160X, 2009
    BibTeX DOI
    Abstract: Human exploitation and use of marine and coastal areas are apparent and growing in many regions of the world. For instance, fishery, shipping, military, raw material exploitation, nature protection and the rapidly expanding offshore wind power technology are competing for limited resources and space. The development and implementation of Integrated Coastal Zone Management (ICZM) strategies could help to solve these problems. Therefore, suitable spatial assessment, modeling, planning and management tools are urgently needed. These tools have to deal with data that include complex information on different spatial and temporal scales. A systematic approach based on the development of future scenarios which are assessed by combining different simulation models, GIS methods and an integrating set of ecological integrity indicators, was applied in a case study in the German North Sea. Here, the installation of huge offshore wind parks within the near future is planned. The aim was to model environmental effects of altered sea-use patterns on marine biota. Indicators of ecological integrity were used to assess altering conditions and possible ecosystem shifts ranging from systems' degradations to the development of highly productive and diverse artificial reef systems. The results showed that some ecosystem processes and properties and related indicators are sensitive to changes generated by offshore wind park installations while others did not react as hypothesized
  • Small-file Access in Parallel File Systems (Philip Carns, Sam Lang, Robert Ross, Murali Vilayannur, Julian Kunkel, Thomas Ludwig), In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–11, IEEE Computer Society (Washington, DC, USA), IPDPS-09, University of Rome, Rome, Italy, ISBN: 978-1-4244-3751-1, 2009
    BibTeX URL DOI
    Abstract: Today's computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in generation of many small files. These applications are not well served by current parallel file systems at very large scale. This paper describes five techniques for optimizing small-file access in parallel file systems for very large scale systems. These five techniques are all implemented in a single parallel file system (PVFS) and then systematically assessed on two test platforms. A microbenchmark and the mdtest benchmark are used to evaluate the optimizations at an unprecedented scale. We observe as much as a 905% improvement in small-file create rates, 1,106% improvement in small-file stat rates, and 727% improvement in small-file removal rates, compared to a baseline PVFS configuration on a leadership computing platform using 16,384 cores.
  • Poster: Data Storage and Processing for High Throughput RNAi Screening (Julian Kunkel, Thomas Ludwig, M. Hemberger, G. Torralba, E. Schmitt, M. Hausmann, V. Lindenstruth, N. Brown, R. Schneider), Heidelberg, Germany, German Symposium on Systems Biology 2009, 2009
    BibTeX Publication
  • Using Non-blocking I/O Operations in High Performance Computing to Reduce Execution Times (David Buettner, Julian Kunkel, Thomas Ludwig), In Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 134–142, Springer-Verlag (Berlin, Heidelberg), EuroPVM/MPI-09, CSC - IT, Espoo, Finland, ISBN: 978-3-642-03769-6, 2009
    BibTeX URL DOI
    Abstract: As supercomputers become faster, the I/O part of applications can become a real problem in regard to overall execution times. System administrators and developers of hardware or software components reduce execution times by creating new and optimized parts for the supercomputers. While this helps a lot in the struggle to minimize I/O times, adjustment of the execution environment is not the only option to improve overall application behavior. In this paper we examine if the application programmer can also contribute by making use of non-blocking I/O operations. After an analysis of non-blocking I/O operations and their potential for shortening execution times we present a benchmark which was created and run in order to see if the theoretical promises also hold in practice.
  • A Strategy for Cost Efficient Distributed Data Storage for In-memory OLAP (Olga Mordvinova, Oleksandr Shepil, Thomas Ludwig, Andrew Ross), In Proceedings of the IADIS International Conference Applied Computing 2009 (1), pp. 109–117, IADIS Press (Algarve, Portugal), IADIS-09, International Association for Development of the Information Society, Rome, Italy, ISBN: 978-972-8924-97-3, 2009
    BibTeX

2008

  • Ecological risk as a tool for evaluating the effects of offshore wind farm construction in the North Sea (Corinna Nunneri, Hermann Lenhart, Benjamin Burkhard, Wilhelm Windhorst), In Regional Environmental Change, Series: 8-1, pp. 31–43, Springer (Berlin / Heidelberg, Germany), ISSN: 1436-3798, 2008-03
    BibTeX DOI
    Abstract: Offshore wind power generation represents a chance to supply energy in a more sustainable way; however, the ecological risks associated with the construction and operation of offshore wind farms are still largely unknown. This paper uses the concept of ecological risk for analysing ecological changes during construction of offshore wind farms. “Ecological risk” is defined as the potentially reduced ability of providing ecosystem services. The ERSEM ecosystem model allows assessing ecological risk based on a number of selected variables (integrity indicators) and under the assumption that increased suspended matter concentration during construction of wind farms affects ecosystem functioning. We conclude that ecological risk is adequate to describe the effects of wind farm constructions, although the computation procedure still needs to be refined and the choice of indicators further optimised. In this context, the choice of indicators available in modelling as well as in monitoring time-series may offer the way forward
  • Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring (Julian Kunkel, Thomas Ludwig), In Euro-Par '08: Proceedings of the 14th international Euro-Par conference on Parallel Processing, pp. 212–221, Springer-Verlag (Berlin, Heidelberg), Euro-Par-08, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain, ISBN: 978-3-540-85450-0, 2008
    BibTeX URL DOI
    Abstract: Today we recognize a high demand for powerful storage. In industry this issue is tackled either with large storage area networks, or by deploying parallel file systems on top of RAID systems or on smaller storage networks. The bigger the system gets the more important is the ability to analyze the performance and to identify bottlenecks in the architecture and the applications. We extended the performance monitor available in the parallel file system PVFS2 by including statistics of the server process and information of the system. Performance monitor data is available during runtime and the server process was modified to store this data in off-line traces suitable for post-mortem analysis. These values can be used to detect bottlenecks in the system. Some measured results demonstrate how these help to identify bottlenecks and may assists to rank the servers depending on their capabilities
  • Directory-Based Metadata Optimizations for Small Files in PVFS (Michael Kuhn, Julian Kunkel, Thomas Ludwig), In Euro-Par '08: Proceedings of the 14th international Euro-Par conference on Parallel Processing, pp. 90–99, Springer-Verlag (Berlin, Heidelberg), Euro-Par-08, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain, ISBN: 978-3-540-85450-0, 2008 – Awards: Best Paper
    BibTeX DOI
    Abstract: Modern file systems maintain extensive metadata about stored files. While this usually is useful, there are situations when the additional overhead of such a design becomes a problem in terms of performance. This is especially true for parallel and cluster file systems, because due to their design every metadata operation is even more expensive. In this paper several changes made to the parallel cluster file system PVFS are presented. The changes are targeted at the optimization of workloads with large numbers of small files. To improve metadata performance, PVFS was modified such that unnecessary metadata is not managed anymore. Several tests with a large quantity of files were done to measure the benefits of these changes. The tests have shown that common file system operations can be sped up by a factor of two even with relatively few changes.
  • The use of 'ecological risk' for assessing effects of human activities: an example including eutrophication and offshore wind farm construction in the North Sea (Corinna Nunneri, Hermann Lenhart, Benjamin Burkhard, Franciscus Colijn, Felix Müller, Wilhelm Windhorst), In Landscape online, Series: 5, ISSN: 1865-1542, 2008
    BibTeX DOI
    Abstract: This paper takes the move from the uncertainty surrounding ecosystem thresholds and addresses the issue of ecosystem-state assessment by means of ecological integrity indicators and ‘ecological risk‘. The concept of ‘ecological risk‘ gives a measure of the likelihood of ecosystem failure to provide the level of natural ecological goods and services expected/desired by human societies. As a consequence of human pressures (use of resources and discharge into the environment), ecosystem thresholds can be breached thus resulting in major threats to human health, safety and well-being. In this study we apply the concept of ‘ecological risk‘ to two case-studies in the German exclusive economic zone: eutrophication and construction of offshore wind farms. The effects of different future scenarios for single-uses upon ecosystem integrity are analysed as well as the effects of one combined scenario. We conclude that in the short term construction of offshore wind farms can influence some processes to a much larger degree than eutrophication, however, combined impacts deriving from eutrophication and offshore wind farm construction need a more detailed analysis. Due to non-linear ecosystem processes, effects of combined or multiple uses of marine resources in terms of ‘ecological risk‘, cannot be extrapolated from single-use scenarios

2007

  • Performance Evaluation of the PVFS2 Architecture (Julian Kunkel, Thomas Ludwig), In PDP '07: Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp. 509–516, IEEE Computer Society (Washington, DC, USA), PDP-07, Euromicro, Napoli, Italy, ISBN: 0-7695-2784-1, 2007
    BibTeX DOI
    Abstract: As the complexity of parallel file systems? software stacks increases it gets harder to reveal the reasons for performance bottlenecks in these software layers. This paper introduces a method which eliminates the influence of the physical storage on performance analysis in order to find these bottlenecks. Also, the influence of the hardware components on the performance is modeled to estimate the maximum achievable performance of a parallel file system. The paper focusses on the Parallel Virtual File System 2 (PVFS2) and shows results for the functionality file creation, small contiguous I/O requests and large contiguous I/O requests.
  • Analysis of the MPI-IO Optimization Levels with the PIOViz Jumpshot Enhancement (Thomas Ludwig, Stephan Krempel, Michael Kuhn, Julian Kunkel, Christian Lohse), In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science (4757), pp. 213–222, (Editors: Franck Cappello, Thomas Hérault, Jack Dongarra), Springer (Berlin / Heidelberg, Germany), EuroPVM/MPI-07, Institut national de recherche en informatique et automatique, Paris, France, ISBN: 978-3-540-75415-2, 2007
    BibTeX URL DOI
    Abstract: With MPI-IO we see various alternatives for programming file I/O. The overall program performance depends on many different factors. A new trace analysis environment provides deeper insight into the client/server behavior and visualizes events of both process types. We investigate the influence of making independent vs. collective calls together with access to contiguous and non-contiguous data regions in our MPI-IO program. Combined client and server traces exhibit reasons for observed I/O performance.
  • Smart Carpet: A Footstep Tracking Interface (Domnic Savio, Thomas Ludwig), In Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops (2), pp. 754–760, IEEE Computer Society (Washington, DC, USA), AINAW-07, IEEE, Niagara Falls, Ontario, Canada, ISBN: 978-0-7695-2847-2, 2007
    BibTeX DOI
    Abstract: Distributed computing infrastructures provided by sensor networks supports new ways of observing human motion. In this paper, we discuss algorithms that use data from sensor networks for tracking the gait of human walk. Three methods for identifying footsteps and footstep patterns that describe a walk of a subject are discussed. The methods interpret the data into a set of coordinates that represent a
    trajectory of the subject's walk. The results show that the proposed methods are more accurate and cost less than existing methods.
  • Nutrient emission reduction scenarios in the North Sea: An abatement cost and ecosystem integrity analysis (Corinna Nunneri, Wilhelm Windhorst, R. Kerry Turner, Hermann Lenhart), In Ecological Indicators, Series: 7-4, pp. 776–792, Elsevier B.V (Amsterdam, Netherlands), ISSN: 1470-160X, 2007
    BibTeX DOI
    Abstract: Economic cost–benefit appraisal (and its sub-set cost-effectiveness) of ecosystem conservation and/or pollution abatement strategies have proved to be powerful decision-making aids. But the monetary economic valuation of ecosystem goods and services (gains and losses) can only provide a good indication of social welfare impacts under certain conditions and in selective contexts. The values derived through this appraisal process will, for a number of measures, be underestimates of the full total system value [Turner, R.K., Paavola, J., Cooper, P. Farber, S., Jessamy, V., Georgiou, S., 2003. Valuing nature: lessons learned and future research directions. Ecol. Econ. 46, 493–510]. The economic analysis is best suited to assessing the value of ‘marginal’ gains and losses in ecosystem goods/services and not the total destruction of whole systems (including life support systems, the value of which is not commensurate with monetary values and/or is infinitely high). In this study economic costs and what we call 'ecological risk' analysis are used to appraise the implementation costs and ecological benefits of selected measures for combating eutrophication. Ecological risk is expressed in terms of ecosystem integrity and resilience. The paper presents three regional case studies dealing with the issue of nutrient emission reduction to the southern North Sea, namely the catchments/estuaries of the Humber (UK), the Rhine (Germany and The Netherlands) and the Elbe (Czech Republic and Germany). On the basis of these comparative regional examples, wider implications in the light of international management of the North Sea are presented. A range of nutrient reduction scenarios have been deployed within the overall OSPAR target agreement of 50% nitrogen and phosphorous reduction compared with 1985 levels. Each scenario assumes pollution reduction measures, characterised in terms of their overall implementation costs and nutrient-reduction effects. Specific policy instruments analysed were: the creation of more intertidal habitat via managed coastal realignment in the Humber area, farm-based land cover changes in the Rhine catchment and a mix of agricultural regime and wastewater treatment plant (WWTP) improvements in the Elbe area. The ecological consequences associated with each reduction scenario have been modelled [using the ERSEM model, see Baretta, J.W., Ebenhöh, W., Ruardij, P., 1995. An overview over the European Regional Sea Ecosystem Model, a complex marine ecosystem model. Neth. J. Sea Res. 33 (3/4), 233–246] for the coastal zones supplied from the rivers Elbe, Humber and Rhine. The modelled ecological quality indicators, which describe scenario effects on the coastal zone ecosystem, are then aggregated in terms of ecosystem integrity and ecological risk. The results are presented in terms of two selected key-indicators: implementation costs of the abatement measures and changes in ecological risk status, across the different catchments and assuming different scenarios. They thus provide a possible basis for international agreement negotiations at the North Sea scale

2006

  • Tracing the MPI-IO Calls' Disk Accesses (Thomas Ludwig, Stephan Krempel, Julian Kunkel, Frank Panse, Dulip Withanage), In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science (4192), pp. 322–330, (Editors: Bernd Mohr, Jesper Larsson Träff, Joachim Worringen, Jack Dongarra), Springer (Berlin / Heidelberg, Germany), EuroPVM/MPI-06, C&C Research Labs, NEC Europe Ltd., and the Research Centre Jülich, Bonn, Germany, ISBN: 3-540-39110-X, 2006
    BibTeX URL DOI
    Abstract: With parallel file I/O we are faced with the situation that we do not have appropriate tools to get an insight into the I/O server behavior depending on the I/O calls in the corresponding parallel MPI program. We present an approach that allows us to also get event traces from the I/O server environment and to merge them with the client trace. Corresponding events will be matched and visualized. We integrate this functionality into the parallel file system PVFS2 and the MPICH2 tool Jumpshot. Keywords: Performance Analyzer, Parallel I/O, Visualization, Trace-based Tools, PVFS2.

2005

  • Hint Controlled Distribution with Parallel File Systems (Hipolito Vasquez, Thomas Ludwig), In Recent Advances in Parallel Virtual Machine and Message Passing Interface (3666), pp. 110–118, Springer (Berlin / Heidelberg, Germany), EuroPVM/MPI-05, Sorrento, Italy, ISBN: 978-3-540-29009-4, 2005
    BibTeX URL DOI
    Abstract: The performance of scientific parallel programs with high file-I/O-activity running on top of cluster computers strongly depends on the qualitative and quantitative characteristics of the requested I/O-accesses. It also depends on the corresponding mechanisms and policies being used at the parallel file system level. This paper presents the motivation and design of a set of MPI-IO-hints. These hints are used to select the distribution function with which a parallel file system manipulates an opened file. The implementation of a new physical distribution function called varstrip_dist is also presented in this article. This function is proposed based upon spatial characteristics presented by I/O-access patterns observed at the application level
  • Defining a good ecological status of coastal waters - a case study for the Elbe plume (Wilhelm Windhorst, Franciscus Colijn, Saa Kabuta, Remi Laane, Hermann Lenhart), In Managing European Coasts (Jan Vermaat, Wim Salomons, Laurens Bouwer, Kerry Turner), pp. 59–74, Springer (Berlin / Heidelberg, Germany), ISBN: 978-3-540-23454-8, 2005
    BibTeX DOI
    Abstract: The definition of a good ecological status of coastal waters requires a close cooperation between sciences (natural and socio-economic) and decision makers. An argument is presented for the use of ecosystem integrity assessment based on indicators of function and state. Ecosystem integrity is understood to be reflected in exergy capture (here expressed as net primary production), storage capacity (as nutrient input/outut balances for coastal sediments), cycling (turn-over of winter nutrient stocks), matter losses (into adjacent water), and heterogeneity (here the diatom/non-diatom ratio of planktonic algae is used). Its feasibility is assessed using ERSEM, an ecosystem model of the North Sea, for the Elbe plume, after prior satisfactory calibration. Three scenarios were applied corresponding to 80, 70 and 60% reduction of the riverine nutrient load into the German Bight, compared to a reference situation of 1995. The modelling effort suggested that drastic nutrient load reduction from the Elbe alone would have a limited effect on the larger German Bight: even a 60% reduction scenario would only lead to moderate changes in all five indicators. In conclusion, application of functional integrity indicators appears feasible for coastal seas at larger spatial scales (i.e. the German Bight), and, for the coast, would form a useful addition to the indicators presently proposed in the Water Framework Directive (WFD)
  • Catchment-coastal zone interaction based upon scenario and model analysis: Elbe and the German Bight case study (J. Hofmann, H. Behrendt, A. Gilbert, R. Jannssen, A. Kannen, Hermann Lenhart, W. Lise, Corinna Nunneri, Wilhelm Windhorst), In Regional Environmental Change, Series: 5 (2-3), Springer (Berlin / Heidelberg, Germany), ISSN: 1436-378X, 2005
    BibTeX DOI
    Abstract: This paper presents a holistic strategy on the interaction of activities in the Elbe river basin and their effects on eutrophication in the coastal waters of the German Bight. This catchment–coastal zone interaction is the main target of the EUROCAT (EUROpean CATchments, catchment changes and their impact on the coast) research project, with the Elbe being one of eight case studies. The definition of socio-economic scenarios is linked with the application of models to evaluate measures in the catchment by estimation of nutrient emissions with MONERIS (MOdelling Nutrient Emissions in RIver Systems), and their effects on coastal waters with the ecosystem model ERSEM (European Regional Seas Ecosystem Model). The cost effectiveness of reduction measures will then be evaluated by application of the CENER model (Cost-Effective Nutrient Emission Reduction) and a multi-criteria analysis. Finally, the interpretation of ecological integrity is used as a measure to describe ecological impacts in an aggregated form

2004

  • A Fast Program for Phylogenetic Tree Inference with Maximum Likelihood (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier), In High Performance Computing in Science and Engineering, pp. 273–283, (Editors: Siegfried Wagner, Werner Hanke, Arndt Bode, Franz Durst), Springer (Berlin / Heidelberg, Germany), ISBN: 978-3-540-26657-0, 2004-05
    BibTeX URL DOI
    Abstract: Inference of large phylogenetic trees using elaborate statistical models is computationally extremely intensive. Thus, progress is primarily achieved via algorithmic innovation rather than by brute-force allocation of all available computational resources. We present simple heuristics which yield accurate trees for synthetic (simulated) as well as real data and improve execution time compared to the currently fastest programs. The new heuristics are implemented in a sequential program (RAxML) which is available as open source code. Furthermore, we present a non-deterministic parallel version of our algorithm which in some cases yielded super-linear speedups for computations with 1000 organisms. We compare sequential RAxML performance with the currently fastest and most accurate programs for phylogenetic tree inference based on statistical methods using 50 synthetic alignments and 9 real-world alignments comprising up to 1000 sequences. RAxML outperforms those programs for real-world data in terms of speed and final likelihood values
  • A Fast Program for Maximum Likelihood-based Inferrence of Large Phylogenetic Trees (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier), In Proceedings of the 2004 ACM symposium on Applied computing, pp. 197–201, ACM (New York, USA), SAC-04, University of Cyprus, Nicosia, Cyprus, ISBN: 1-58113-812-1, 2004
    BibTeX DOI
    Abstract: The computation of large phylogenetic trees with maximum likelihood is computationally intensive. In previous work we have introduced and implemented algorithmic optimizations in PAxML. The program shows run time improvements > 25% over parallel fastDNAml
    yielding exactly the same results. This paper is focusing on computations of large phylogenetic trees (> 100 organisms) with maximum likelihood. We propose a novel, partially randomized algorithm and new parsimony-based rearrangement heuristics, which are implemented in a sequential and parallel program called RAxML.We provide experimental results for real biological data containing 101 up to 1000 sequences and simulated data containing 150 to 500 sequences, which show run time improvements of factor 8 up to 31 over PAxML yielding equally good trees in terms of likelihood values and RF distance rates at the same time. Finally, we compare the performance of the sequential version of RAxML with a greater variety of available ML codes such as fastDNAml, AxML and MrBayes. RAxML is a freely available open source program
  • North Sea Hydrodynamic Modelling: A Review (Hermann Lenhart, Thomas Pohlmann), In Senckenbergiana maritima, Series: 34-(1/2), pp. 53–88, (Editors: Ingrid Kröncke, Michael Türkay, Jürgen Sündermann), 2004
    BibTeX DOI
  • Phylogenies with Statistical Methods: Problems & Solutions (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier), In Proceedings of 4th International Conference on Bioinformatics and Genome Regulation and Structure, pp. 229–233, BGRS-04, Novosibirsk, Russia, 2004
    BibTeX URL
    Abstract: The computation of ever larger as well as more accurate phylogenetic trees with the ultimate goal to compute the “tree of life” represents a major challenge in Bioinformatics. Statistical methods for phylogenetic analysis such as maximum likelihood or bayesian inference, have shown to be the most accurate methods for tree reconstruction. Unfortunately, the size of trees which can be computed in reasonable time is limited by the severe computational complexity induced by these statistical methods. However, the field has witnessed great algorithmic advances over the last 3 years which enable inference of large phylogenetic trees containing 500-1000 sequences on a single CPU within a couple of hours using maximum likelihood programs such as RAxML and PHYML. An additional order of magnitude in terms of computable tree sizes can be obtained by parallelizing these new programs. In this paper we briefly present the MPI-based parallel implementation of RAxML (Randomized Axelerated Maximum Likelihood), as a solution to compute large phylogenies. Within this context, we describe how parallel RAxML has been used to compute the –to the best of our knowledge- first maximum likelihood-based phylogenetic tree containing 10.000 taxa on an inexpensive LINUX PC-Cluster. In addition, we address unresolved problems, which arise when computing large phylogenies for real-world sequence data consisting of more than 1.000 organisms with maximum likelihood, based on our experience with RAxML. Finally, we discuss potential
  • Parallel Inference of a 10.000-taxon Phylogeny with Maximum Likelihood (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier), In Euro-Par 2004 Parallel Processing, Lecture Notes in Computer Science (3149), pp. 997–1004, (Editors: Marco Danelutto, Marco Vanneschi, Domenico Laforenza), Springer (Berlin / Heidelberg, Germany), Euro-Par-04, University of Pisa and Institute of Information Science and Technologies (ISTI-CNR), Pisa, Italy, 2004
    BibTeX URL DOI
    Abstract: Inference of large phylogenetic trees with statistical methods is computationally intensive. We recently introduced simple heuristics which yield accurate trees for synthetic as well as real data and are implemented in a sequential program called RAxML. We have demonstrated that RAxML outperforms the currently fastest statistical phylogeny programs (MrBayes, PHYML) in terms of speed and likelihood values on real data. In this paper we present a non-deterministic parallel implementation of our algorithm which in some cases yields super-linear speedups for an analysis of 1.000 organisms on a LINUX cluster. In addition, we use RAxML to infer a 10.000-taxon phylogenetic tree containing representative organisms from the three domains: Eukarya, Bacteria and Archaea. Finally, we compare the sequential speed and accuracy of RAxML and PHYML on 8 synthetic alignments comprising 4.000 sequences.
  • Research Trends in High Performance Parallel Input/Output for Cluster Environments (Thomas Ludwig), In Proceedings of the 4th International Scientific and Practical Conference on Programming, pp. 274–281, UkrPROG-04, National Academy of Sciences of Ukraine, Kiev, Ukraine, 2004
    BibTeX URL DOI
    Abstract: Parallel input/output in high performance computing is a field of increasing importance. In particular with compute clusters we see the concept of replicated resources being transferred to I/O issues. Consequently, we find research questions like e.g. how to map data structures to files, which resources to actually use, and how to deal with failures in the environment. The paper will introduce the problem of massive I/O from the user´s point of view and illustrate available programming interfaces. After a short description of some available parallel file systems we will concentrate on the research directions in that field. Besides other questions, efficiency is the main issue. It depends on an appropriate mapping of data structures onto file segments which in turn are spread over physical disks. Our own work concentrates on measuring the performance of individual mappings and to change them dynamically to increase performance and control the sharing of resources
  • New Fast and Accurate Heuristics for Inference of Large Phylogenetic Trees (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier), In Proceedings of 18th IEEE/ACM International Parallel and Distributed Processing Symposium, pp. 193, IEEE Computer Society (Washington, DC, USA), IPDPS-04, University of New Mexico, Santa Fe, New Mexico, ISBN: 0-7695-2132-0, 2004
    BibTeX DOI
    Abstract: Inference of phylogenetic trees comprising thousands of taxa using maximum likelihood is computationally extremely expensive. We present simple heuristics which yield accurate trees for simulated as well as real data and reduce execution time. The new heuristics have been implemented in a program called RAxML which is freely available. Furthermore, we present a distributed version of our algorithm which is implemented in an MPI-based prototype. This prototype is being used to implement an http-based seti@home-like version of RaxML. We compare our program with PHYML and MrBayes which are currently the fastest and most accurate programs for phylogenetic tree inference. Experiments are conducted using 50 simulated 100 taxon alignments as well as real-world alignments with up to 1000 sequences. RAxML outperforms MrBayes for real-world data both in terms of speed and final likelihood values. Furthermore, for real data RAxML outperforms PHYML by factor 2-8 and yields better final trees due to its more exhaustive search strategy. For synthetic data MrBayes is slightly more accurate than RAxML and PHYML but significantly slower
  • Helics - ein Rechner der Superklasse (Peter Bastian, Thomas Ludwig), In Ruperto Carola, Series: 3, pp. 4–7, Universitätsverlag C. Winter (Heidelberg), ISSN: 0035-998 X, 2004
    BibTeX

2003

  • On-line monitoring systems and computer tool interoperability (Thomas Ludwig, Barton Miller), Nova Science Publishers, Inc. (Commack, NY, USA), ISBN: 1-59033-888-X, 2003
    BibTeX
  • DAxML: A Program for Distributed Computation of Phylogenetic Trees Based on Load Managed CORBA (Alexandros Stamatakis, Markus Lindermeier, Michael Ott, Thomas Ludwig, Harald Meier), In Parallel Computing Technologies, Lecture Notes in Computer Science (2763), pp. 538–548, (Editors: Victor Malyshkin), Springer (Berlin / Heidelberg, Germany), PaCT-03, Nizhni Novgorod State University and Russian Academy of Sciences (Academgorodok, Novosibirsk), Nizhni Novgorod, Russia, 2003
    BibTeX URL DOI
    Abstract: High performance computing in bioinformatics has led to important progress in the field of genome analysis. Due to the huge amount of data and the complexity of the underlying algorithms many problems can only be solved by using supercomputers. In this paper we present DAxML, a program for the distributed computation of evolutionary trees. In contrast to prior approaches DAxML runs on a cluster of workstations instead of an expensive supercomputer. For this purpose we transformed PAxML, a fast parallel phylogeny program incorporating novel algorithmic optimizations, into a distributed application. DAxML
    uses modern object-oriented middleware instead of message-passing communication in order to reduce the development and maintenance costs. Our goal is to provide DAxML to a broad range of users, in particular those who do not have supercomputers at their disposal. We ensure high performance and scalability by applying a high-level load management service called LMC (Load Managed CORBA). LMC
    provides transparent system level load management by integrating the load management functionality directly into the ORB. In this paper we demonstrate the simplicity of integrating LMC into a real-world application and how it enhances the performance and scalability of DAxML
  • Phylogenetic tree inference on PC architectures with AxML/PAxML (Alexandros P. Stamatakis, Thomas Ludwig), In Proceedings of the International Parallel and Distributed Processing Symposium, pp. 8, IEEE Computer Society (Washington, DC, USA), IPDPS-03, University of Nice, Nice, France, ISBN: 0-7695-1926-1, 2003
    BibTeX DOI
    Abstract: Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally extremely expensive. In previous work, we have introduced subtree equality vectors (SEV) to significantly reduce the number of required floating point operations during topology evaluation and implemented this method in (P)AxML, which is a derivative of (parallel) fastDNAml. Experimental results show that (P)AxML scales particularly well on inexpensive PC-processor architectures obtaining global run time accelerations between 51% and 65% over (parallel) fastDNAml for large data sets, yet rendering exactly the same output. In this paper, we present an additional SEV-based algorithmic optimization which scales well on PC processors and leads to a further improvement of global execution times of 14% to 19% compared to the initial version of AxML. Furthermore, we present novel distance-based heuristics for reducing the number of analyzed tree topologies, which further accelerate the program by 4% up to 8%. Finally, we discuss a novel experimental tree-building algorithm and potential heuristic solutions for inferring large high quality trees, which for some initial tests rendered better trees and accelerated program execution at the same time by a factor greater than 6
  • Monitoring concepts for parallel systems: an evolution towards interoperable tool environments (Roland Wismüller, Thomas Ludwig, Wolfgang Karl, Arndt Bode), In On-line monitoring systems and computer tool interoperability (Thomas Ludwig, Barton Miller), pp. 1–21, Nova Science Publishers, Inc. (Commack, NY, USA), ISBN: 1-59033-888-X, 2003
    BibTeX
    Abstract: For more than 10 years the research group at LRR-TUM investigates concepts for on-line monitoring techniques and designs and implements monitoring systems for various hardware and software architectures. From the early systems in the beginning of the 90's to the sophisticated interface-based approach for interoperable tools many milestones were reached that represented each a considerable progress in the field of tool construction. The paper gives an overview on the most important aspects of our work and pinpoints the increase of knowledge in this area

2002

  • Accelerating Parallel Maximum Likelihood-based Phylogenetic Tree Calculations using Subtree Equality Vectors (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier, Marty J. Wolf), In Proceedings of the Supercomputing Conference 2002, pp. 40, IEEE Computer Society (Washington, DC, USA), sc-02, IEEE/ACM, Baltimore, Maryland, USA, ISBN: 0-7695-1524-X, 2002
    BibTeX DOI
    Abstract: Heuristics for calculating phylogenetic trees for a large sets of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. The core of most parallel algorithms, which accounts for the greatest part of computation time, is the tree evaluation function, that calculates the likelihood value for each tree topology. This paper describes and uses Subtree Equality Vectors (SEVs) to reduce the number of required floating point operations during topology evaluation. We integrated our optimizations into various sequential programs and into parallel fastDNAml, one of the most common and efficient parallel programs for calculating large phylogenetic trees. Experimental results for our parallel program, which renders exactly the same output as parallel fastDNAml show global runtime improvements of 26% to 65%. The optimization scales best on clusters of PCs, which also implies a substantial cost saving factor for the determination of large trees
  • Efficiently building on-line tools for distributed heterogeneous environments (Günther Rackl, Thomas Ludwig, Markus Lindermeier, Alexandros Stamatakis), In Scientific Programming, Series: 10 (1), pp. 67–74, IOS Press (Amsterdam, The Netherlands), ISSN: 1058-9244, 2002
    BibTeX
    Abstract: Software development is getting more and more complex, especially within distributed middleware-based environments. A major drawback during the overall software development process is the lack of on-line tools, i.e. tools applied as soon as there is a running prototype of an application. The MIMO MIddleware MOnitor provides a solution to this problem by implementing a framework for an efficient development of on-line tools. This paper presents a methodology for developing on-line tools with MIMO. As an example scenario, we choose a distributed medical image reconstruction application, which represents a test case with high performance requirements. Our distributed, CORBA-based application is instrumented for being observed with MIMO and related tools. Additionally, load balancing mechanisms are integrated for further performance improvements. As a result, we obtain an integrated tool environment for observing and steering the image reconstruction application. By using our rapid tool development process, the integration of on-line tools shows to be very convenient and enables an efficient tool deployment
  • AxML: a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method (Alexandros P. Stamatakis, Thomas Ludwig, Harald Meier, Marty J. Wolf), In Proceedings of 1. IEEE Bioinformatics Conference, pp. 21–28, IEEE Computer Society (Washington, DC, USA), CSB-02, Stanford University, Palo Alto, California, USA, ISBN: 0-7695-1653-X, 2002
    BibTeX DOI
    Abstract: Heuristics for the NP-complete problem of calculating the optimal phylogenetic tree for a set of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. In most existing algorithms, the tree evaluation and branch length optimization functions, calculating the likelihood value for each tree topology examined in the search space, account for the greatest part of the overall computation time. This paper introduces AxML, a program derived from fastDNAml, incorporating a fast topology evaluation junction. The algorithmic optimizations introduced, represent a general approach for accelerating this function and are applicable to both sequential and parallel phylogeny programs, irrespective of their search space strategy. Therefore, their integration into three existing phylogeny programs rendered encouraging results. Experimental results on conventional processor architectures show a global run time improvement of 35% up to 47% for the various test sets and program versions we used
  • Efficiently Building On-Line Tools for Distributed Heterogeneous Environments (Günther Rackl, Thomas Ludwig, Markus Lindermeier, Alexandros Stamatakis), In Proceedings of the International Workshop on Performance-Oriented Application Development for Distributed Architectures, Scientific Programming (10-1), pp. 67–74, IOS Press (Amsterdam, The Netherlands), PADDA-01, Technical University Munich, Munich, Germany, ISSN: 1058-9244, 2002
    BibTeX URL
    Abstract: Software development is getting more and more complex, especially within distributed middleware-based environments. A major drawback during the overall software development process is the lack of on-line tools, i.e. tools applied as soon as there is a running prototype of an application. The MIMO MIddleware MOnitor provides a solution to this problem by implementing a framework for an efficient development of on-line tools. This paper presents a methodology for developing on-line tools with MIMO. As an example scenario, we choose a distributed medical image reconstruction application, which represents a test case with high performance requirements. Our distributed, CORBA-based application is instrumented for being observed with MIMO and related tools. Additionally, load balancing mechanisms are integrated for further performance improvements. As a result, we obtain an integrated tool environment for observing and steering the image reconstruction application. By using our rapid tool development process, the integration of on-line tools shows to be very convenient and enables an efficient tool deployment
  • Nährstoffe und Eutrophierung (Uwe Brockmann, Hermann Lenhart, Heinke Schlünzen, Dilek Topcu), In Warnsignale aus Nordsee und Wattenmeer - Eine aktuelle Umweltbilanz (Jose L. Lozan, Eike Rachor, Karsten Reise, Jürgen Sündermann, Hein Westernhagen), pp. 61–76, ISBN: 978-3000101663, 2002
    BibTeX
  • Adapting PAxML to the Hitachi SR8000-F1 Supercomputer (Alexandros Stamatakis, Thomas Ludwig, Harald Meier), In High Performance Computing in Science and Engineering (Siegfried Wagner, Werner Hanke, Arndt Bode, Franz Durst), pp. 453–466, Springer (Berlin / Heidelberg, Germany), 2002
    BibTeX

2001

  • EveMan - a Mobile Time and Space Organisation System for the Palm Computing Platform (Andreas Krause, Thomas Ludwig), In ITG-Fachbericht 168, VDE Verlag (Berlin, Germany), APC-01, Munich, Germany, 2001
    BibTeX
  • Tool Environments in CORBA-Based Medical High Performance Computing (Thomas Ludwig, Markus Lindermeier, Alexandros Stamatakis, Günther Rackl), In Parallel Computing Technologies, Lecture Notes in Computer Science (2127), pp. 447–455, (Editors: Victor Malyshkin), Springer (Berlin / Heidelberg, Germany), PaCT-01, Institute of Computational Mathematics and Mathematical Geophysics of the Russian Academy of Sciences (Novosibirsk), Novosibirsk, Russia, 2001
    BibTeX URL DOI
    Abstract: High performance computing in medical science has led to important progress in the field of computer tomography. A fast calculation of various types of images is a precondition for statistical comparison of big sets of input data. With our current research we adapted parallel programs from PVM to CORBA. CORBA makes the
    integration into clinical environments much easier. In order to improve the efficiency and maintainability we added load balancing and graphical on-line tools to our CORBA-based application program
  • A methodology for efficiently developing on-line tools for heterogeneous middleware (Thomas Ludwig, Günther Rackl), In Proceedings of the 34th Annual Hawaii International Conference on System Sciences, pp. 10, IEEE Computer Society (Washington, DC, USA), HICSS-34, University of Hawaii, Island of Maui, Hawaii, USA, ISBN: 0-7695-0981-9, 2001
    BibTeX DOI
    Abstract: Software development is getting more and more complex, especially within distributed and heterogeneous environments based on middleware enabling the interaction of distributed components. A major drawback during the overall software development process is the lack of online tools, i.e. tools applied as soon as there is a running prototype of an application. Online tools can either be used for development tasks like debugging or visualisation of programs, or for deployment tasks like application management. For various middleware platforms, online tools have been developed, but most of them suffer from the drawbacks of being tailored to specific middleware, offering only a small set of tool functions, and not being extensible. The MIMO MIddleware MOnitor project proposes a solution to this problem. MIMO is based on a clear separation of tools, the monitoring system which collects data and controls an observed application, and the applications. The MIMO core consists of a lightweight infrastructure that allows to integrate heterogeneous middleware in a flexible way, monitor applications simultaneously using several middleware platforms, and build interoperable development and deployment tools. This paper presents a methodology for developing online tools with MIMO

2000

  • Airport simulation using CORBA and DIS (Günther Rackl, Filippo de Stefani, Francois Héran, Antonello Pasquarelli, Thomas Ludwig), In Future Generation Computer Systems, Series: 16 (5), pp. 465–472, Elsevier Science Publishers B. V. (Amsterdam, The Netherlands), ISSN: 0167-739X, 2000
    BibTeX DOI
    Abstract: This paper presents the SEEDS simulation environment for the evaluation of distributed traffic control systems. Starting with an overview of the general simulator architecture, performance measurements of the simulation environment carried out with a prototype for airport ground-traffic simulation are described. The main aspects of the performance analysis are the attained application performance using CORBA and DIS as communication middleware, and the scalability of the overall approach. The evaluation shows that CORBA and DIS are well suited for distributed interactive simulation purposes because of their adequate performance, high scalability, and the high-level programming model which allows to rapidly develop and maintain complex distributed applications
  • Effects of river nutrient load reduction on the eutrophication of the North Sea, simulated with the ecosystem model ERSEM (Hermann Lenhart), In Senckenbergiana maritima, Series: 31-2, pp. 299–311, (Editors: Ingrid Kröncke, Michael Türkay, Jürgen Sündermann), 2000
    BibTeX DOI
    Abstract: The results of the ecosystem model ERSEM showed, that a reduction in the nutrient load by 50% for N and P cannot be linearly transferred to a similar reduction in primary production in comparison to the standard run for the year 1988. While the reduction scenario results in decreased winter concentrations of nitrogen and phosphorus of up to 40%, the decrease in net primary production reached only up to 20% in small areas in the coastal zone. The phytoplankton groups indicated different reactions to the changed nutrient availability. Generally, there were significant changes in the strength and timing of the nutrient limitation in all phytoplankton groups in the model, but the diatom concentration did not change much. Differences did occur for the flagellates, with sporadically higher flagellate concentration in comparison to the standard run. This result is important, because the increase in algal biomass due to eutrophication was related mainly to an increase in flagellates, which are not decreasing accordingly in the reduction scenario. The reduction scenarios demonstrated that changes in the discharges of the major rivers hardly affect the central North Sea, but lead to significant regional differences in the net primary production. Greatest differences with regard to primary production were found downstream of the river Rhine and Elbe. This leads to changes in the mass flows in the coastal area with an increased importance of the microbial loop. One possible reason for the muted reaction of primary production to decreasing nutrient inputs can be seen in the temporal coincidence of maximum river inputs and the phytoplankton spring bloom. Due to the high nutrient uptake during the spring bloom, inorganic nutrients are bound in the phytoplankton and form a potential for remineralisation. With a more efficient microbial loop, the system becomes less dependent on riverine nutrient inputs in summer
  • Performance assessment of parallel spectral analysis: Towards a practical performance model for parallel medical applications (Frank Munz, Thomas Ludwig, Sibylle Ziegler, Peter Bartenstein, Markus Schwaiger, Arndt Bode), In Future Generation Computer Systems, Series: 16 (5), pp. 553–562, Elsevier Science Publishers B. V. (Amsterdam, The Netherlands), ISSN: 0167-739X, 2000
    BibTeX DOI
    Abstract: We present a parallel, medical application for the analysis of dynamic positron emission tomography (PET) images together with a practical performance model. The parallel application improves the diagnosis for a patient (e. g. in epilepsy surgery) because it enables the fast computation of parametric images on a pixel level in contrast to the traditionally used region of interest (ROI) approach. We derive a simple performance model from the application context and demonstrate the accuracy of the model to predict the runtime of the application on a NOW. The model is used to determine an optimal value for the length of the messages with regard to the per message overhead and the load imbalance
  • Interoperable Run Time Tools for Distributed Systems - A Case Study (Roland Wismüller, Thomas Ludwig), In The Journal of Supercomputing, Series: 17-3, pp. 277–289, Springer (Berlin / Heidelberg, Germany), ISSN: 0920-8542, 2000
    BibTeX URL DOI
    Abstract: Tools that observe and manipulate the run-time behavior of parallel and distributed systems are essential for developing and maintaining these systems. Sometimes users would even need to use several tools at the same time in order to have a higher functionality at their disposal. Today, tools developed independently by different vendors are, however, not able to interoperate. Interoperability not only allows concurrent use of tools, but also can lead to an added value for the user. A debugger interoperating with a checkpointing system, for example, can provide a debugging environment where the debugged program can be reset to any previous state, thus speeding up cyclic debugging for long running programs. Using this example scenario, we derive requirements that should be met by the tools“ software infrastructure in order to enable interoperability. A review of existing infrastructures shows that these requirements are only partially met today. In an ongoing research effort, support for all of the requirements is built into the OMIS compliant on-line monitoring system OCM
  • Interoperability Support in Distributed On-Line Monitoring Systems (Jörg Trinitis, Vaidy Sunderam, Thomas Ludwig, Roland Wismüller), In Proceedings of High Performance Computing and Networking - 8th International Conference, Lecture Notes in Computer Science (1823), pp. 261–269, (Editors: Marian Bubak, Hamideh Afsarmanesh, Bob Hertzberger, Roy Williams), Springer (Berlin / Heidelberg, Germany), HPCN-00, University of Amsterdam, Amsterdam, The Netherlands, 2000
    BibTeX URL DOI
    Abstract: Sophisticated on-line tools play an important role in the software life-cycle, by decreasing software development and maintenance effort without sacrificing software quality. Using multiple tools simultaneously would be very beneficial; however, with most contemporary tools, this is impossible since they are often based on incompatible methods of data acquisition and control. This is due largely to their relative independence, and could be overcome by an appropriately designed common on-line monitoring system. We consider three possible platforms that might be potentially capable of addressing this issue, and discuss the relative merits and demerits of each
  • CORBA-basierte verteilte Berechnung medizinischer Bilddaten mit SPM (Marcel May, Frank Munz, Thomas Ludwig), In Bildverarbeitung für die Medizin, Informatik aktuell, pp. 213–217, (Editors: Alexander Horsch, Thomas Lehmann), Springer (Berlin / Heidelberg, Germany), BVM-00, Technical University Munich, Munich, Germany, ISBN: 3-540-67123-4, 2000
    BibTeX
  • Effiziente Scheduling-Algorithmen für datenparallele Anwendungen der funktionellen medizinischen Bildgebung auf NOWs (Frank Munz, Thomas Ludwig, Arndt Bode, Sibylle Ziegler, Markus Schwaiger), In Bildverarbeitung für die Medizin, Informatik aktuell, pp. 403–407, (Editors: Alexander Horsch, Thomas Lehmann), Springer (Berlin / Heidelberg, Germany), BVM-00, Technical University Munich, Munich, Germany, ISBN: 3-540-67123-4, 2000
    BibTeX

1999

  • Synergetic Tool Environments (Thomas Ludwig, Jörg Trinitis, Roland Wismüller), In Parallel Computing Technologies, Lecture Notes in Computer Science (1662), pp. 754–754, (Editors: Victor Malyshkin), Springer (Berlin / Heidelberg, Germany), PaCT-99, Institute of Computational Mathematics and Mathematical Geophysics of the Russian Academy of Sciences (Novosibirsk) and the Electrotechnical University of St.Petersburg, St. Petersburg, Russia, 1999
    BibTeX URL DOI
    Abstract: In the field of parallel programming we notice a consider- able lack of efficient on-line tools for debugging, performance analysis etc. This is due to the fact that the construction of those tools must be based on a complicated software infrastructure. In the case of such software being available tools from different vendors are almost always incompatible as they use proprietary implementations for it. We will demonstrate in this paper that only a common infrastructure will ease the construction of on-line tools and that it is a necessary precondition for eventually having interoperable tools. Interoperable tools form the basis for synergetic tool environments and yield an added value over just integrated environments
  • Kinetic analysis of functional images: The case for a practical approach to performance prediction (Frank Munz, Thomas Ludwig, Sibylle Ziegler, Peter Bartenstein, Markus Schwaiger, Arndt Bode), In High Performance Computing, Lecture Notes in Computer Science (1615), pp. 169–180, (Editors: Constantine Polychronopoulos, Kazuki Fukuda, Shinji Tomita), Springer (Berlin / Heidelberg, Germany), ISHPC-99, Kyoto, Japan, 1999
    BibTeX URL DOI
    Abstract: We present the first parallel medical application for the analysis of dynamic positron emission tomography (PET) images together with a practical performance model. The parallel application may improve the diagnosis for a patient (e. g. in epilepsy surgery) because it enables the fast computation of parametric images on a pixed level as opposed to the traditionally used region of interest (ROI) approach which is applied to determine an average parametric value for a particular anatomic region of the brain. We derive the performance model from the application context and show its relation to abstract machine models. We demonstrate the accuracy of the model to predict the runtime of the application on a network of workstations and use it to determine an optimal value in the message frequency-size relationship
  • Towards Monitoring in Parallel and Distributed Environments (Ivan Zoraja, Günther Rackl, Thomas Ludwig), In Proceedings of the International Conference on Software in Telecommunications and Computer Networks, pp. 133–141, SoftCom-99, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, FESB Split, Trieste, Venice, Italy, 1999
    BibTeX URL DOI
    Abstract: Rapid technology transitions and growing complexity in parallel and distributed systems make software development in these environments increasingly difficult. Therefore, among other CASE-tools, software developers and users need powerful tools that are able to gather information from running applications as well as to dynamically manipulate with their executions. To connect multiple tools to the running application, online monitoring systems that integrate common tools functionality and provide synchronization among multiple requests are developed. This paper presents and compares three monitoring systems developed at LRR-TUM M unchen which address different programming paradigms. OMIS/OCM is aimed at message passing systems, CORAL at distributed shared memory (DSM) paradigm and MIMO at distributed object computing (DOC). Although our monitoring systems are aimed at different programming paradigms they are based on many similar concepts and solutions. In addition, monitoring featur…

1997

  • The effects of river input on the ecosystem dynamics in the continental coastal zone of the North Sea using ERSEM (Hermann Lenhart, Günther Radach, Piet Ruardij), In Journal of Sea Research, Series: 38 (3-4), pp. 249–274, Elsevier B.V (Amsterdam, Netherlands), ISSN: 1385-1101, 1997
    BibTeX DOI
    Abstract: The general characteristics of the continental coastal zone, with nutrient concentrations, primary production and biomass high near the coast but decreasing with distance from the coast, are simulated by a box-refined version of the ecosystem model ERSEM. Aggregated model results compared to the literature as well as to two different three-dimensional models show a good agreement in the coastal region. The dynamical interactions as simulated by the ecosystem model are presented in the form of N/P ratios, the limitation by various nutrients and changes in the pathways of the flow of matter in the boxes; e.g. while the silicate limitation stops the spring bloom offshore, near the coast it is terminated by zooplankton
    grazing. When the river load was reduced by 50%, the largest effect was observed in the coastal boxes with 15% reduction of the net primary production. The discharges of the major rivers hardly affect the central North Sea, but lead to significant changes in nutrient limitations and mass flows in the coastal area. The realistic forcing, which was adopted for this setup, allows a higher net primary production in the southern North Sea in 1989 than in 1988, even though the nutrient river loads in 1989 were lower. The reason appears to be a higher solar energy input in 1989, by about 10 W m^−2 d-^-1, compared to 1988
  • The ICES-boxes approach in relation to results of a North Sea circulation model (Hermann Lenhart, Thomas Pohlmann), In Tellus, Series: 49 A, pp. 139–160, Blackwell (Oxford, United Kingdom), ISSN: 0280-6495, 1997
    BibTeX
    Abstract: Based on the division of the North Sea into the ICES boxes, often used as a tool to present data aggregated from measurements, the results of a 3-D baroclinic model for 11 years of simulation are presented. Since the information on the ICES boxes from the 3-D model is available as synoptic property, a comprehensive description of the water budget, the transport through the boxes as well as the flushing times is attained. Moreover, the results of the 3-D model allows for a quantification of the horizontal diffusion acting between the ICES boxes. Integrated properties of the boxes are obtained from comparing the depth of the thermocline derived by vertical diffusion and by the gradient in the temperature profile. Considerable changes can be observed for the period of stratification within the boxes from the gradient in the temperature profile compared to potential energy calculations. The properties of the ICES boxes as the basis for a box model are presented by means of the ecosystem model ERSEM. A tracer study, using freshwater as a conservative tracer, compares the results of the ERSEM box model with the results aggregated from a gridded dispersion model, using the same physical forcing. From this comparison, an appropriate transport representation for the box model is derived. Furthermore the results of ERSEM demonstrate that a modification of the boxes, i.e., by introducing a vertical separation for those boxes which are stratified in summer, can improve the representation of the biological processes influenced by the thermocline dynamics

1996

  • Influence of variability in water transport on phytoplankton biomass and composition in the southern North Sea: a modelling approach (FYFY) (A. J. Van Den Berg, H. Ridderinkhof, R. Riegman, P. Ruardij, Hermann Lenhart), In Continental Shelf Research, Series: 16-7, pp. 907–917, Elsevier B.V (Amsterdam, Netherlands), ISSN: 0278-4343, 1996
    BibTeX DOI
    Abstract: A model for phytoplankton composition and succession coupled to a transport model for the southern North Sea is presented. This model is used to examine the time and spatial variability in phytoplankton biomass and succession. Long term time variability due to the variability in horizontal water transport is studied by using daily varying transport fields for the period 1970-1981. These transport fields result from simulations with a circulation model driven by realistic wind fields for this period. Selective factors for phytoplankton are resource competition and zooplankton grazing. This leads to a general abundance of edible phytoplankton groups in the whole southern North Sea, while poorly edible groups mainly occur in the eutrophicated coastal areas. Apart from this, phytoplankton groups which are specialized in growth under nitrogen-limited conditions are selected in open sea while, near the Dutch coast and the German Bight, phosphate-specialized groups are selected. From a comparison of simulations with yearly averaged and daily varying transport fields, it is concluded that differences with respect to the annual mean phytoplankton biomass are negligible. However, large differences are found for the distribution and abundance of specific phytoplankton groups. A simulation for the period 1970-1981 shows that part of the observed variability in spring biomass as well as the variability in the duration of dominance and abundance of species near the Dutch coast can be attributed to the variability in the horizontal water transport

1995

  • Nutrient dynamics in the North Sea: Fluxes and budgets in the water column derived from ERSEM (Günther Radach, Hermann Lenhart), In Netherlands Journal of Sea Research, Series: 33 (3-4), Elsevier B.V (Amsterdam, Netherlands), ISSN: 0077-7579, 1995
    BibTeX DOI
    Abstract: Nutrient dynamics for phosphate, nitrate, ammonium and silicate have been simulated with ERSEM, the European Regional Seas Ecosystem Model. From the model results budgets for the dissolved inorganic nutrients and the corresponding particulate fractions have been calculated. The annual cycles of the nutrients phosphate and silicate compare quite well with the observed ranges of variability. This does not hold for ammonium and nitrate. Biologically mediated transformations such as nutrient uptake and pelagic and benthic mineralization are the dominant processes in changing the nutrient concentrations with the horizontal advective contributions playing a minor role during the productive season. Vertical advection and vertical diffusion have a clear seasonal signal, with a maximum in February. The decay of the advective nutrient transport in summer is caused by the depletion of the upper layer of dissolved inorganic nutrients by algal uptake. The inflow of nutrients in the northwest is almost balanced by the outflow in the northeast, without causing large nutrient transports into the shallower areas from the north. However, from the coastal areas there is a nutrient flow towards the central North Sea, enhancing primary production in the central area
  • Simulations of the North Sea circulation, its variability, and its implementation as hydrodynamical forcing in ERSEM (Hermann Lenhart, Günther Radach, Jan O. Backhaus, Thomas Pohlmann), In Netherlands Journal of Sea Research, Series: 33 (3-4), pp. 271–299, Elsevier B.V (Amsterdam, Netherlands), ISSN: 0077-7579, 1995
    BibTeX DOI
    Abstract: The rationale is given of how the gross physical features of the circulation and the stratification of the North Sea have been aggregated for inclusion in the ecosystem box model ERSEM. As the ecosystem dynamics are to a large extent determined by small-scale physical events, the ecosystem model is forced with the circulation of a specific year rather than using the long-term mean circulation field. Especially the vertical exchange processes have been explicitly included, because the primary production strongly depends on them. Simulations with a general circulation model (GCM), forced by three-hourly meteorological fields, have been utilized to derive daily horizontal transport values driving ERSEM on boxes of sizes of a few 100 km. The daily vertical transports across a fixed 30-m interface provide the necessary short-term event character of the vertical exchange. For the years 1988 and 1989 the properties of the hydrodynamic flow fields are presented in terms of trajectories of the flow, thermocline depths, of water budgets, flushing times and diffusion rates. The results of the standard simulation with ERSEM show that the daily variability of the circulation, being smoothed by the box integration procedure, is transferred to the chemical and biological state variables to a very limited degree only

1990

  • Nährstoffe in der Nordsee - Eutrophierung, Hypertrophierung und deren Auswirkungen (G. Radach, W. Schönfeld, Hermann Lenhart), In Warnsignale aus der Nordsee (José .L. Lozan, Walter Lenz, Eike Rachor, Burkard Watermann, Heinz v. Westernhagen), pp. 48–65, Paul Parey Verlagsbuchhandlung (Hamburg / Berlin), 1990
    BibTeX

research/publications.txt · Last modified: 2018-01-24 17:13 (external edit)