Theses

2017

Dynamic decision-making for efficient compression in parallel distributed file systems

AuthorJanosch Hirsch
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn, Anna Fuchs
Date2017-08-12
AbstractThe technology gap between computational speed, storage capacity and storage speed poses big problems especially for the HPC field. A promising technique to bridge this gap is data reduction through compression. Compression algorithms like LZ4 can reach compression speeds high enough to be applicable for the HPC field. Consequently efforts to integrate compression into the Luste file system are in progress. Client side compression also brings the potential to increase the network throughput. But to be able to fully exploit the compression potential the compression configuration has to be adapted to its environment. The more adaptations to the data structure and machines condition the better the compression effectiveness will be. This objective of this thesis is to design a decision logic that dynamically adapts the compression configuration to maximize a desired trade-off between application speed and compression. Different compression algorithms and the conditions for compression on the client side of a distributed file systems are examined to identify possibilities to apply compression. Finally an implemented prototype of the decision and adaption logic is evaluated with different network speeds and starting points to further improve the concept are given.

Thesis BibTeX

Static Code Analysis for HPC Use Cases

AuthorFrank Röder
TypeBachelor's Thesis
AdvisorsAlexander Droste, Dr. Michael Kuhn
Date2017-07-26
AbstractThe major objective of this thesis is to approach the procedure of getting into compiler-based checks. The focus are high-performance computing use cases. Especially the Message-Passing-Interface (MPI) is used to execute parallel tasks via inter-process communication, including parallel reading and writing of files are taken into account. A motivation states why it is remarkable to use static analysis. Following this, techniques and tools to improve software development with static analysis are introduced. Nowadays parallel software has large code bases. With rising complexity the possibility of generating bugs is undeniable. Tools to reduce the error-proneness are important factors of efficiency. The infrastructure of LLVM as well as the Clang Static Analyzer (CSA) are introduced to understand static analysis and how to capture information of the relevant compile phases. Based on this, the utility of an existing check is explained. Problems exposing at runtime are observed through code simulation in the frontend named symbolic execution. In what follows, the comprehension is transferred to the use case of purpose. Common mistakes to overlook like issues with readability and bad code styles are checked through analysis of the abstract-syntax-tree. For this intention the LLVM tool Clang-Tidy has been extended with new checks. The checks regarding symbolic execution involve MPI-IO related double closes and operations concerning file access. The routines to find these bugs have been added to the CSA. This thesis makes use of the already existing structure named MPI-Checker, which provides the handling of MPI. As a summary the benefits of working on checks to detecting serious bugs are mentioned.

Thesis BibTeX

Database VOL-plugin for HDF5

AuthorOlga Perevalova
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn, Eugen Betke
Date2017-07-05
AbstractHDF5 is an open source, hierarchical, and self-describing format for flexible and efficient I/O for high volume and complex data, that combines data and metadata. Advantages of this format make it widely used by many scientific applications. In a parallel HDF5 application when a large number of processes access a shared file simultaneously synchronization mechanism used by many file systems may significantly degrade I/O performance. Separation of metadata and data is the first step to solve this problem. The main contribution of this thesis is a prototype of an HDF5-VOL-Plugin that separates metadata and data. To this end, metadata are stored in an SQLite3 database and data in a shared file. It uses MPI for synchronization of metadata when several processes access the SQLite3 database. In the context of this work a benchmark test has been developed. It measures access times for each metadata operation and the overall I/O performance. The execution time of the Database VOL-plugin is compared to the native solution. The test results show that the database plugin consistently demonstrates good performance. The thesis concludes with a critical discussion of the approach by looking at the metadata from different perspectives: scientific applications vs. HDF5.

Thesis BibTeX

A Numerical Approach to Nonlinear Regression Analysis by Evolving Parameters

AuthorChristopher Gerlach
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn
Date2017-06-29
AbstractNonlinear regression analysis is an important process of statistics and poses many challenges to the user. While linear models are analytically solvable, nonlinear models can in most cases only be solved numerically. What many numeric methods have in common, is that they require a proper starting point to reach satisfactory results. A poor choice of starting values can greatly reduce the convergence speed or in many cases even result in the algorithm not to converge at all. This thesis proposes a genetic numerical hybrid method to approach the problem from a nontraditional angle. The approach combines genetic algorithms with traditional numeric methods and proposes a design suitable for massive parallelization with GPGPU computing. It is shown that the approach can solve a large set of practical test problems without having to specify any starting values and that is fast enough for practical use, utilizing only consumer grade hardware.

Thesis BibTeX

Quality of service improvement in ZFS through compression

AuthorNiklas Bunge
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn, Anna Fuchs
Date2017-05-31
AbstractThis thesis evaluates the improved use of data compression to reduce storage space, increase throughput and reduce bandwidth requirements. The latter is an interesting field of application, not only for telecommunication but also for local data transfer between the CPU and the storage device. The choice of the compression algorithm is crucial for the overall performance. For this reason part of this work is the reflection of which algorithm fits best to a particular situation. The goal of this thesis comprises the implementation of three different features. At first, updating the existing lz4 algorithm enables support for the “acceleration” called lz4fast. This compression speed-up, in lieu of compression ratio, increases write speed on fast storage devices such as SSDs. Second, an automatic decision procedure adapts the compression algorithms gzip-(1-9) and the new updated lz4 to the current environment in order to maximize utilization of the CPU and the storage device. Performance is improved compared to no compression but is highly depends on the hardware setup. On powerful hardware the algorithm successfully adapts to the optimum. The third and last feature enables the user to select a desired file-write-throughput. Scheduling is implemented by delaying and prioritizing incoming requests. Thereby compression is adjusted to not impair the selected requirements while reducing storage space and reducing bandwidth demand respectively. By preferring “fast” files over “slow” files - high throughput over low throughput - the average turnaround time is reduced while maintaining the average compression ratio.

Thesis BibTeX

Support for external data transformation in ZFS

AuthorNiklas Behrmann
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn, Anna Fuchs
Date2017-04-06
AbstractWhile computational power of high-performance computing systems doubled every two years over the last 50 years as predicted by Moore’s law, the same was not true for storage speed and capacity. Compression has become a useful technique to bridge the increasing performance and scalability gap between computation and Input/Output (I/O). For that reason some local filesystems like ZFS support transparent compression of data. For parallel distributed filesystems like Lustre this approach does not exist. Lustre is frequently used in supercomputers. The Intel Parallel Computing Centers (IPCC) for Lustre filesystem project is aiming for compression support in Lustre at multiple levels. The IPCC are universities, institutions, and labs. Their primary focus is to modernize applications to increase parallelism and scalability. A prior thesis started the implementation of online compression with the compression algorithm LZ4 in Lustre. The focus of this implementation was to increase throughput performance. The data is compressed on clientside and send compressed to the server. However the compression leads potentially to a bad read performance. This problem might be solved through modifying the ZFS filesystem which is utilized by Lustre servers as a backend filesystem. ZFS already has a compression functionality integrated which provides good read performance for compressed data. The idea is to make use of this and store the Lustre’s data in ZFS as if it was compressed by ZFS. Therefore a new interface that takes the necessary information has to be created. Implementing this is the purpose of this thesis. The goal is to enable the Lustre compression to save space on disk and most importantly fix the bad read performance. Throughout this thesis the necessary modifications to ZFS are described. The main task is to provide information to ZFS about the compressed size and the uncompressed size of the data. Afterwards a possible implementation of the specified feature is presented. First tests indicate that data which is compressed by Lustre can be read efficiently by ZFS if provided with the necessary metadata.

Thesis BibTeX

Suitability analysis of Object Storage for HPC workloads

AuthorLars Thoms
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2017-03-23
AbstractThis bachelor thesis reviews the possibility of using an Object Storage system like Ceph Object Storage (RADOS) especially about its performance and functionality of partial rewrite. Scientific high-performance computing produces large file objects and its metadata has to be fast searchable. That is why Object Storages are a good solution because they store data efficiently with simple API calls without the requirement to comply with POSIX specification. Unfortunately, these are overloaded and not performant. Above all, object storing in combination with metadata separation to store them in a search-efficient database will increase the performance of searching. Furthermore, per definition objects are supposed to be immutable, but if RADOS API calls are used, they are mutable and can be rewritten like on other filesystems. In this thesis, I am going to investigate whether that objects could be segmented rewritten. Accordingly, I am going to program a FUSE driver as a proof of concept and prepare a series of measurement to show performance and issues. Thereby, it is possible to use Ceph as normal Filesystem, because of mutable objects. Unfortunately, the write performance of this driver was low (around 3 MiB/s). At the end, there is a design concept of an HPC application using a Ceph cluster in combination with a document-oriented database to store metadata.

Thesis BibTeX

Extracting Semantic Relations from Wikipedia using Spark

AuthorHans Ole Hatzel
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2017-02-02
AbstractIn this work, the full text of both the German and the English Wikipedia were used for two subtasks. 1. Finding Compound Words 2. Finding Semantic Associations of Words The approach to the first task was to find all nouns in the Wikipedia and evaluate which of those form compounds with any other nouns that were found. PySpark was used to work through the whole Wikipedia dataset and the performance the part-of-speech tagging operation on the whole dataset was good. In this way, a huge list of nouns was created which could then be used to check it for compound words. As this involved checking each noun against every other noun the performance was not acceptable, with the analysis of the whole English Wikipedia taking over 200 hours. The data generated from the first subtasks was then for the task of both generating and solving CRA tasks. CRA tasks could be generated at a large scale. CRA tasks were solved with an accuracy of up to 33%. The second subtask was able to cluster words based on their semantics. It was established that this clustering works to some extend and that the vectors representing the words therefor have some legitimacy. The second subtask’s results could be used to perform further analysis on how the difficulty of CRA tasks behaves with how words are related to each other.

Thesis BibTeX

2016

Energy usage analysis of HPC applications

AuthorTim Jammer
TypeBachelor's Thesis
AdvisorsDr. Hermann Lenhart, Dr. Michael Kuhn
Date2016-12-06
AbstractThe importance of energy consumption of large scale computer systems will grow in the future, as it is not only a huge cost factor, it also leads to additional difficulty to cool those systems down. In order to gain experience on energy consumed by model simulation, I will analyze the energy consumed by the ECOHAM North Sea ecosystem model to deduct which parts of the application use the most energy. First the influence of the measurement of the energy usage on the application will be discussed. It is important to keep this influence in mind, as one wants to know about the energy usage of the unchanged application, so that the gathered insights are transferable to the application when it is running without the energy measurement. Furthermore my thesis will provide an overview on the energy which is needed by the different phases of the application. A focus is placed on the serial section where the output is written. The busy waiting implemented by the MPI implementation leads to an increased energy consumption. Without this busy waiting the application needs about 4 percent less energy. Therefore, I propose that the programmer of an MPI application should be able to choose which MPI calls should perform a non busy waiting.

BibTeX

Adaptive Selection of Lossy Compression Algorithms Using Machine Learning

AuthorArmin Schaare
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel, Anastasiia Novikova
Date2016-11-29
AbstractThis goal of this thesis was to evaluate machine learning model’s ability for their use as an automatic decision feature for compression algorithms. Their task would be to predict which compression algorithms perform best on what kind of data. For this, artificially generated data, itself, and its compression was analyzed, producing a benchmark of different features, upon which machine learning models could be trained. The models’ goal was to predict the compression and decompression throughput of algorithms Additionally, models had to correctly attribute data to the algorithm producing the best compression ratios. Machine learning approaches under consideration were Linear Models, Decision Trees and the trivial Mean Value Model as a comparison baseline. It was found, that Decision Trees performed significantly better than Linear Models which in turn were slightly better than the Mean Value approach. Nevertheless, even Decision Trees did not produce a satisfying result which could be reliably used for practical applications.

Thesis BibTeX

Evaluation von alternativen Speicherszenarien für hierarchische Speichersysteme

AuthorMarc Perzborn
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2016-10-31
AbstractZiel der vorliegenden Bachelorarbeit war es, das Simulationsprogramm FeO auf seine Korrektheit zu überprüfen und zu verbessern. Dazu wurden verschiedene Szenarien simuliert. Die Ergebnisse bestätigen zum großen Teil die Annahmen. Im Cache gespeicherte Informationen können schneller Ausgegeben werden, als nicht im Cache gespeicherte. Bei wenig verbauten Laufwerken müssen lesende Anfragen auf nicht gecachte Informationen warten, wenn jedes Laufwerk belegt ist. Das Speichermanagement eines vollen Cache funktioniert einwandfrei. Bei einem Cache mit freiem Speicherplatz wird nicht wie in einem realen System reagiert. Die Verarbeitungszeiten für Anfragen auf nicht gecachte Informationen variiert, wenn verschiedene Komponenten des Bandarchives, beispielsweise die Generation der Laufwerke, die Anzahl der Laufwerke des Bandarchives oder die Bandbreite von Komponenten, verändert werden.

Thesis BibTeX

Quality Control of Meteorological Time-Series with the Aid of Data Mining

AuthorJennifer Truong
TypeMaster's Thesis
AdvisorsDr. Julian Kunkel
Date2016-10-30
AbstractThis thesis discusses the topic quality controls in the meteorological field and in particular optimize them by adjustment and construction of an automated pipeline for the quality checks. Three different kinds of pipelines are developed through this thesis: The most general one has the focus on high error detection with a low false positive rate. But a categorized pipeline is also designed, which classify the data in “good”, “bad” and “doubtful”. Furthermore a fast fault detection pipeline is derived from the general pipeline to make it possible to react nearline to hardware fails. In this thesis general fundamentals about meteorological coherence, statistical analysis and quality controls for meteorology are described. After that the approach of this thesis are lead by the development of the automated pipeline. Meteorological measurements and their corresponding quality controls got explored to optimize them. Beside an optimization of existing quality controls, new automated tests are developed within this thesis. The evaluation of the designed pipeline shows that the quality of the pipeline depends on the input parameters. The more information we have for the input the better is the pipeline working. But the specialty of the pipeline is that it works with any kind of input, so it is not limited to strict input parameters.

Thesis BibTeX

MPI-3 algorithms for 3D radiative transfer on Intel Xeon Phi coprocessors

AuthorJannek Squar
TypeMaster's Thesis
AdvisorsPeter Hauschildt, Dr. Michael Kuhn
Date2016-10-20
AbstractOne-sided communication has been added to the MPI standard with MPI-2 in 1997 and has been greatly extended with the introduction of MPI-3 in 2012. Even though one-sided communication offers many use cases, from which an application could benefit, it has only sporadically been used for HPC so far. The objective of this thesis is to examine its potential use for replacing a OpenMP section with equivalent code, which only makes use of MPI. This is done based on an already existing application, named PHOENIX. This application is currently developed at the observatory of Hamburg and has been designed to be executed on HPC systems. Its purpose is, among other things, to numerically solve the equations of 3D radiative transfer for stellar objects. For utilising HPC hardware at its full capacity PHOENIX makes use of MPI and OpenMP. In the course of this thesis a test application has been constructed, which mimics the OpenMP sections and allows to benchmark diverse combinations of MPI one-sided communication operations. The benchmarks are performed on a Intel Xeon Phi Knights Corner and on a Intel Xeon Phi Knights Landing to estimate if a certain approach is suitable for HPC hardware in general. In the end each approach is discussed and assessed which kind of communication pattern might benefit most of MPI one-sided communication.

Thesis BibTeX URL

Suitability Analysis of GPUs and CPUs for Graph Algorithms

AuthorKristina Tesch
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2016-09-27
AbstractThroughout the last years, the trend in HPC is towards heterogeneous cluster architectures that make use of accelerators to speed up computations. For this purpose, many current HPC systems are equipped with Graphics Processing Units (GPUs). These deliver a high floating-point performance, which is important to accelerate compute-intensive applications. This thesis aims to analyze the suitability of CPUs and GPUs for graph algorithms, which can be classified as data-intensive applications. These types of applications perform fewer computations per data element and strongly rely on fast memory access. The analysis is based on two multi-node implementations of the Graph500 benchmark, which execute a number of Breadth-first Searches (BFS) on a large-scale graph. To enable a fair comparison, the same parallel BFS algorithm has been implemented for the CPU and the GPU version. The final evaluation does not only include the performance results but the programming effort that was necessary to achieve the result as well as the cost and energy efficiency. Comparable performance results have been found for both the versions of Graph500, but a significant difference in the programming effort has been detected. The main reason for the high programming effort of the GPU implementation is that complex optimizations are necessary to achieve an acceptable performance in the first place. These require detailed knowledge of the GPU hardware architecture. All in all, the results of this thesis lead to the conclusion that the higher energy efficiency and, depending on the point of view, cost efficiency of the GPUs do not outweigh the lower programming effort for the implementation of graph algorithms on CPUs.

Thesis BibTeX

Leistungs- und Genauigkeitsanalyse numerischer Löser für Differentialgleichungen

AuthorJoel Graef
TypeBachelor's Thesis
AdvisorsFabian Große, Dr. Michael Kuhn
Date2016-09-12
AbstractDiese Bachelorarbeit beschäftigt sich mit der Frage, ob Lösungsverfahren für Differentialgleichungen mit höherer Ordnung in jedem Fall besser für die Verwendung in numerischen Modellen geeignet sind als solche mit niedrigerer Ordnung. Die Frage wird unter Verwendung von vier Lösungsverfahren im Hinblick auf zwei verschiedene Differentialgleichungen und ein NPD-Modell (Nährstoff-Phytoplankton-Detritus), welches ein vereinfachtes marines Ökosystem beschreibt, geklärt. Zunächst werden einige Hintergrundaspekte zu Lösungsverfahren für Differentialgleichungen vorgestellt und auf Einschritt- und Mehrschrittverfahren eingegangen. Hierbei werden insbesondere die verwendeten Lösungsverfahren nach Euler, Heun, Adams-Bashforth 2. Ordnung (AB2) und Runge-Kutta 4. Ordnung (RK4) behandelt. Bei der Leistungsanalyse werden die Verfahren hinsichtlich ihrer Genauigkeit und Laufzeit verglichen. Außerdem wird eine Schrittweitensteuerung vorgestellt, die bei einer Abweichung der Approximation zur analytischen Lösung die Schrittweite reduziert und nach einem bestimmten Intervall wieder erhöht. Sowohl mit Schrittweitensteuerung als auch ohne erreichte das Verfahren höchster Ordnung (RK4) die beste Laufzeit. Unter Verwendung eines NPD-Modells werden die Verfahren mit Ausnahme des AB2-Verfahren ebenfalls analysiert. Dabei wird festgestellt, dass sich die Nutzung vom Heun-, AB2- und RK4-Verfahren gegenüber dem Euler-Verfahren für das Modell nicht rentieren. Ausschlaggebend dafür ist die Wahl der Schrittweite, die von der Genauigkeit der Verfahren abhängt. Die Genauigkeit wird durch die Berechnung von Zusatzrechenschritten erhöht und erlaubt damit die Wahl eines gröberen Zeitschritts. Die Rechenzeit pro Zusatzrechenschritt ist bei der Nutzung des NPD-Modells größer als die Rechenzeiteinsparung durch den gröberen Zeitschritt. Da aber beispielsweise keine Schrittweitensteuerung im Modell implementiert wurde, bestehen durchaus weitere Ansatzpunkte zur Verbesserung der Laufzeit.

Thesis BibTeX

Performanceanalyse der Ein-/Ausgabe des Ökologiemodells ECOHAM5

AuthorSimon Kostede
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn, Fabian Große, Dr. Hermann Lenhart
Date2016-08-22
AbstractDas Ziel dieser Arbeit ist die Analyse der Ein- und Ausgabe (E/A) des Ökosystemmodells ECOHAM5. ECOHAM5 ist ein paralleles HPC-Programm, das mit MPI parallelisiert ist. Es werden NetCDF Dateien als Ergebnis der Simulation ausgegeben. Wie bei vielen Erdsystem- und Klimamodellen wird bei ECOHAM5 nur serielle E/A durchgeführt, was die Skalierung stark einschränkt. Für die Analyse wurde ECOHAM5 für parallele E/A erweitert und es wurde die Performance gemessen und analysiert. Zudem wurde parallele E/A in ECOHAM5 implementiert. ECOHAM5 ist ein Erdsystemmodell, das die Ökologie der Nordsee simuliert. Das Modell wird genutzt um Fragen des Kohlenstoffflusses in der Nordsee im Rahmen des Klimawandels sowie Fragen zur Auswirkung unterschiedlicher Belastung des Ökosystems Nordsee durch Nährstoffeinträge von Stickstoff und Phosphor zu untersuchen. Dazu wird die Nordsee in ein dreidimensionales Gitter unterteilt und für jede Gitterzelle werden für eine Reihe von Zustandsvariablen numerische Differenzialgleichungen gelöst. Das Modellgebiet des ECOHAM-Gitters umfasst den Nordwesteuropäischen Kontinentalschelf (NECS) und Teile des angrenzenden Nordostatlantiks. ECOHAM5 ist in Fortran implementiert und nutzt MPI für die parallele Ausführung mit mehreren Prozessen. Jeder dieser Prozesse ist an der Berechnung der Simulation beteiligt. Die Simulationsergebnisse werden in der ursprünglichen Version von ECOHAM5 von einem Prozess/Rechenknoten, dem Masterknoten, mit NetCDF gespeichert. Diese serielle E/A wurde in dieser Arbeit verschiedentlich untersucht. Die Implementierung wurde statisch anhand des Quellcodes analysiert. Die Ausführung wurde gemessen und mithilfe des Tracingprogramms Vampir/Score-P ausgewertet. Für die E/A nutzt ECOHAM5 die Bibliotheken MPI, MPI-IO, HDF5 und NetCDF. Die neue Version von ECOHAM5 mit paralleler E/A konnte sich, auf dem Testsystem mit 10 Rechenknoten, zeitlich nicht von der Version mit serieller E/A absetzten, sondern war etwa zwischen 15% und 25% langsamer.

Thesis BibTeX

Untersuchung von Interaktiven Analyse- und Visualisierungsumgebungen im Browser für NetCDF-Daten

AuthorSebastian Rothe
TypeMaster's Thesis
AdvisorsDr. Julian Kunkel
Date2016-07-21
AbstractSimulations- und Messergebnisse von Klimamodellen umfassen heutzutage oftmals große Datenmengen, die beispielsweise in NetCDF-Dateien als spezielle Datenstrukturen abgelegt werden können. Die Analyse dieser Messergebnisse benötigt meist komplexe und leistungsstarke Systeme, die es dem Nutzer ermöglichen, die Datenmenge an Simulationsergebnissen beispielsweise in tabellarischer Form oder durch grafische Repräsentation anschaulich darzustellen. Moderne Cloud-Systeme bieten dem Nutzer die Möglichkeit, Ergebnisse zu speichern und beispielsweise über das Internet weltweit verfügbar zu machen. Dieses Verfahren hat allerdings auch den Nachteil, dass dazu erst die gesamte Ergebnisdatei aus dem Cloud-System angefordert werden muss, bevor sie analysiert werden kann. Diese Arbeit befasst sich mit der Untersuchung eines alternativen Ansatzes, bei dem es für den Nutzer möglich sein soll, über eine Webanwendung erste Analysen auf serverseitig ausgeführten Werkzeugen durchzuführen, deren Ergebnisse dann im Webbrowser veranschaulicht werden können. Basis dieser ReDaVis (Remote Data Visualizer) genannten Anwendung bilden die Softwaresysteme OpenCPU und h5serv. Die Voranalysen arbeiten auf kleinen Teilmengen der Daten. Sie sollen Aufschluss darüber geben, ob detailliertere Analysen auf dem Gesamtdatensatz lohnenswert sind. Es soll untersucht werden, inwiefern vorhandene Tools diesen Ansatz bereits umsetzen können. Einige dieser Komponenten werden dann verwendet und durch eigene Komponenten ergänzt, um einen Software-Prototyp des vorgestellten Ansatzes entwickeln zu können. Dazu werden zunächst theoretische Grundlagen genauer erläutert, die dann dazu verwendet werden, die eingesetzten Komponenten als Webanwendung zusammenfassen zu können. Die Anwendung unterstützt neben Visualisierungstechniken zur grafischen Repräsentation der Datensätze auch die Möglichkeit, verschiedene aufeinanderfolgende Funktionen in Form einer Pipeline auf einen Datensatz anzuwenden. Es wird gezeigt, inwiefern die unterschiedlichen Konstellationen an Komponenten zusammenarbeiten können oder durch Einschränkungen auf Software- und Hardwareebene ungeeignet sind beziehungsweise mit Blick auf heute weit verbreitete Alternativen nicht leistungsfähig genug arbeiten.

Thesis BibTeX

Automation of manual code optimization via DSL-directed AST-manipulation

AuthorJonas Gresens
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2016-06-27
AbstractProgram optimization is a crucial step in the development of performance critical applications but in most cases only manually realizable due to its complexity. The substantial structural changes to the source code reduce the readability and maintainability and complicate the ongoing development of the applications. The objective of this thesis is to examine the advantages and disadvantages of an AST-based solution to the conflicting relationship between performance and structural code quality of a program. For this purpose a prototype is developed to automate usually manual optimizations based on instructions by the user. The thesis covers the design and implementation as well as the evaluation of the prototype for the usage as a tool in software development. As a result this thesis shows the categorical usability of the AST-based approach and the need for further investigation.

Thesis BibTeX

Client-Side Data Transformation in Lustre

AuthorAnna Fuchs
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn
Date2016-05-25
AbstractDue to the increasing gap between computation power and storage speed and capacity, compression techniques for compensating the I/O bottleneck become more urgent than ever. Although some file systems already support compression, none of the distributed ones do. Lustre is a widely used distributed parallel file system in the HPC area, which can only profit from ZFS backend compression so far. Along with archiving desires to reduce storage space, network throughput can also benefit from compression on the client side. Userspace benchmarks showed, compression can increase throughput by up to a factor of 1.2 while decreasing the required storage space by half. This thesis primarily aims to analyze the suitability of compression for the Lustre client and to introduce online compression based on stripes. This purpose places certain demands on the compression algorithm to be used. Slow algorithms can have adverse effects and decrease system’s overall performance. A higher compression ratio at the expense of lower speed can nevertheless be worthwhile due to the sharply reduced amount of data to be transferred. LZ4 is one of the fastest compression algorithms and a good candidate to be used on-the-fly. A prototype of LZ4 fast compression within a Lustre client will be presented for a limited number of use cases. In course of the design, different approaches are discussed with regard to transparency and avoidance of code duplication. Finally, some ideas for adaptive compression, client hints and server-side support will be presented.

BibTeX

Modeling and Simulation of Tape Libraries for Hierarchical Storage Management Systems

AuthorJakob Lüttgau
TypeMaster's Thesis
AdvisorsDr. Julian Kunkel
Date2016-04-09
AbstractThe wide variety of storage technologies (SRAM, NVRAM, NAND, Disk, Tape, etc.) results in deep storage hierarchies to be the only feasible choice to meet performance and cost requirements when dealing with vast amounts of data. In particular long term storage systems employed by scientific users are mainly reliant on tape storage, as they are still the most cost-efficient option even 40 years after their invention in the mid-seventies. Current archival systems are often loosely integrated into the remaining HPC storage infrastructure. However, data analysis tasks require the integration into the scratch storage systems. With the rise of exascale systems and in situ analysis also burst buffers are likely to require integration with the archive. Unfortunately, exploring new strategies and developing open software for tape archive systems is a hurdle due to the lack of affordable storage silos, the resulting lack of availability outside of large organizations and due to increased wariness requirements when dealing with ultra durable data. Eliminating some of these problems by providing virtual storage silos should enable community-driven innovation, and enable site operators to add features where they see fit while being able to verify strategies before deploying on test or production systems. The thesis asseses moderns tape systems and also puts their development over time into perspective. Subsequently, different models for the individual components in tape systems are developed. The models are then implemented in a prototype simulation using discrete event simulation. It is shown that the simulation can be used to approximate the behavior of tape systems deployed in the real world and to conduct experiments without requiring a physical tape system.

Thesis Presentation BibTeX

2015

Vorhersage von E/A-Leistung im Hochleistungsrechnen unter der Verwendung von neuronalen Netzen

AuthorJan Fabian Schmid
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2015-12-17
AbstractDie Vorhersage der Laufzeit von Dateizugriffen im Hochleistungsrechner ist wichtig für die Entwicklung von Analysewerkzeugen, die Wissenschaftler bei der effizienten Nutzung der gegebenen Ressourcen unterstützen können. In dieser Bachelorarbeit wird das parallele Dateisystem eines Hochleistungsrechners analysiert und unter dem Einsatz künstlicher neuronaler Netze werden verschiedene Ansätze zur Modellierung der Ein-/Ausgabe-Leistung entwickelt und getestet. Dabei erreichen die entwickelten künstlichen neuronalen Netze bei der Vorhersage von Zugriffszeiten geringere Modellabweichungen gegenüber den tatsächlichen Zugriffszeiten als lineare Modelle. Es stellt sich heraus, dass der entscheidende Faktor für eine gute Modellierung des Ein-/Ausgabe-Systems darin liegt, zwischen gleichartigen Dateizugriffen, die allerdings zu verschiedenen Zugriffszeiten führen, zu unterscheiden. Die Laufzeitdifferenzen zwischen Dateizugriffen mit gleichen Aufrufparametern können durch die unterschiedliche Verarbeitung im System erklärt werden. Da diese Verarbeitungspfade nicht bekannt oder aus direkt messbaren Attributen ableitbar sind, zeigt sich, dass die Vorhersage der Zugriffszeiten eine nicht triviale Aufgabe ist. Ein Ansatz besteht darin, periodische Verhaltensmuster des Systems auszunutzen, um den Verarbeitungspfad eines Zugriffs vorauszusagen. Dieses periodische Verhalten gezielt für genauere Vorhersagen zu verwenden, erweist sich allerdings als schwierig. Um eine Näherung der Verarbeitungspfade zu bestimmen, wird in dieser Bachelorarbeit ein Verfahren eingeführt, bei dem die Residuen eines Modells zur Erstellung von Klassen genutzt werden, welche wiederum mit den Verarbeitungspfaden korrelieren sollten. Bei der Analyse dieser Klassen können Hinweise auf ihren Zusammenhang mit den Verarbeitungspfaden gefunden werden. So sind Modellierungen, die diese Klassenzuordnungen verwenden, in der Lage, wesentlich genauere Vorhersagen zu machen als andere Modelle. Die Vorhersage der Laufzeit von Dateizugriffen im Hochleistungsrechner ist wichtig für die Entwicklung von Analysewerkzeugen, die Wissenschaftler bei der effizienten Nutzung der gegebenen Ressourcen unterstützen können. In dieser Bachelorarbeit wird das parallele Dateisystem eines Hochleistungsrechners analysiert und unter dem Einsatz künstlicher neuronaler Netze werden verschiedene Ansätze zur Modellierung der Ein-/Ausgabe-Leistung entwickelt und getestet. Dabei erreichen die entwickelten künstlichen neuronalen Netze bei der Vorhersage von Zugriffszeiten geringere Modellabweichungen gegenüber den tatsächlichen Zugriffszeiten als lineare Modelle. Es stellt sich heraus, dass der entscheidende Faktor für eine gute Modellierung des Ein-/Ausgabe-Systems darin liegt, zwischen gleichartigen Dateizugriffen, die allerdings zu verschiedenen Zugriffszeiten führen, zu unterscheiden. Die Laufzeitdifferenzen zwischen Dateizugriffen mit gleichen Aufrufparametern können durch die unterschiedliche Verarbeitung im System erklärt werden. Da diese Verarbeitungspfade nicht bekannt oder aus direkt messbaren Attributen ableitbar sind, zeigt sich, dass die Vorhersage der Zugriffszeiten eine nicht triviale Aufgabe ist. Ein Ansatz besteht darin, periodische Verhaltensmuster des Systems auszunutzen, um den Verarbeitungspfad eines Zugriffs vorauszusagen. Dieses periodische Verhalten gezielt für genauere Vorhersagen zu verwenden, erweist sich allerdings als schwierig. Um eine Näherung der Verarbeitungspfade zu bestimmen, wird in dieser Bachelorarbeit ein Verfahren eingeführt, bei dem die Residuen eines Modells zur Erstellung von Klassen genutzt werden, welche wiederum mit den Verarbeitungspfaden korrelieren sollten. Bei der Analyse dieser Klassen können Hinweise auf ihren Zusammenhang mit den Verarbeitungspfaden gefunden werden. So sind Modellierungen, die diese Klassenzuordnungen verwenden, in der Lage, wesentlich genauere Vorhersagen zu machen als andere Modelle.

Thesis Presentation BibTeX

Advanced Data Transformation and Reduction Techniques in ADIOS

AuthorTim Alexander Dobert
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2015-10-07
AbstractBecause of the slow improvements of storage hardware, compression has become very important for high performance computing. Efficient strategies that provide a good compromise between computational overhead and compression ratio have been developed in recent years. However, when data reduction is used, usually a single strategy is applied to the whole system. These solution generally do not take advantage of the structure within files, which is often known beforehand. This thesis explores several data transformation techniques that can take advantage of patterns within certain types of data to improve compression results. Specific examples are developed and their applications, strengths and weaknesses are discussed. With an array of transformations to choose from, users can make the best choice for each file type, leading to an overall reduction of space. To make this usable in a HPC environment, the transforms are implemented into an I/O library. ADIOS is chosen for this as it provides an easy way to configure I/O parameters and metadata, as well as an extensible framework for transparent on the fly data transformations. The prototyping and implementation process of the transformations is detailed and their effectiveness is tested and evaluated on scientific climate data. Results show that the transforms are quite powerful in theory, but do not have a great effect on real data. While not improving compression results, the discrete cosine transformation is worthwhile on its own, providing an option to sacrifice accuracy for size reduction.

BibTeX

Static Code Analysis of MPI Schemas in C with LLVM

AuthorAlexander Droste
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2015-09-25
AbstractThis thesis presents MPI-Checker, a static analysis checker for MPI code written in C, based on Clang's Static Analyzer. The checker works with path-sensitive as well as with non-path-sensitive analysis which is purely based on information provided by the abstract syntax tree representation of source code. MPI-Checker's AST-based checks verify correct type usage in MPI functions, utilization of collective communication operations and provides experimental support to verify if point-to-point function calls have a matching partner. Its path-sensitive checks verify aspects of nonblocking communication, based on the usage of MPI requests, which are tracked by a symbolic representation of their memory region in the course of symbolic execution. The thesis elucidates for MPI-Checker relevant parts of the LLVM/Clang API and how the implementation is integrated into the architecture. Furthermore, the basics of MPI are explained. MPI-Checker introduces only negligible overhead on top of the Clang Static Analyzer core and is able to detect critical bugs in real world codebases, which is shown by evaluating analysis results for the open source projects AMG2013 and OpenFFT.

BibTeX

Automatisches Lernen der Leistungscharakteristika von Paralleler Ein-/Ausgabe

AuthorEugen Betke
TypeMaster's Thesis
AdvisorsDr. Julian Kunkel
Date2015-06-27
AbstractDie Leistungsanalyse und -optimierung sind seit dem Beginn der elektronischen Datenverarbeitung notwendige Schritte in den Qualitätssicherungs- und Optimierungszyklen. Sie helfen eine qualitative und performante Software zu erstellen. Insbesondere im HPC-Bereich ist dieses Thema wegen der steigender Softwarekomplexität sehr aktuell. Die Leistungsanalysewerkzeuge helfen den Prozess wesentlich zu vereinfachen und zu beschleunigen. Sie stellen die Vorgänge verständlich dar und liefern Hinweise auf mögliche Verbesserungen. Deren Weiterentwicklung und Entwicklung neuer Verfahren ist deshalb essentiell für diesen Bereich. Das Ziel dieser Arbeit ist zu untersuchen, ob E/A-Operationen mit Hilfe von maschinellen Lernen automatisch den richten Cachetypen zugeornet werden können. Zu diesem Zweck werden Methoden entwickelt, die auf den CART-Entscheidungsbäumen und kMeans-Algorithmen basieren und untersucht. Die erhofften Ergebnisse wurden auf diese Weise nicht erreicht. Deswegen werden zum Schluss die Ursachen indentifiziert und diskutiert.

Thesis Presentation BibTeX

Evaluation of performance and productivity metrics of potential programming languages in the HPC environment

AuthorFlorian Wilkens
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn, Sandra Schröder
Date2015-04-28
AbstractThis thesis aims to analyze new programming languages in the context of high-performance computing (HPC). In contrast to many other evaluations the focus is not only on performance but also on developer productivity metrics. The two new languages Go and Rust are compared with C as it is one of the two commonly used languages in HPC next to Fortran. The base for the evaluation is a shortest path calculation based on real world geographical data which is parallelized for shared memory concurrency. An implementation of this concept was written in all three languages to compare multiple productivity and performance metrics like execution time, tooling support, memory consumption and development time across different phases. Although the results are not comprehensive enough to invalidate C as a leading language in HPC they clearly show that both Rust and Go offer tremendous productivity gain compared to C with similar performance. Additional work is required to further validate these results as future reseach topics are listed at the end of the thesis.

BibTeX

Dynamically Adaptable I/O Semantics for High Performance Computing

AuthorMichael Kuhn
TypePhD Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2015-04-27
AbstractFile systems as well as libraries for input/output (I/O) offer interfaces that are used to interact with them, albeit on different levels of abstraction. While an interface's syntax simply describes the available operations, its semantics determines how these operations behave and which assumptions developers can make about them. There are several different interface standards in existence, some of them dating back decades and having been designed for local file systems; one such representative is POSIX. Many parallel distributed file systems implement a POSIX-compliant interface to improve portability. Its strict semantics is often relaxed to reach maximum performance which can lead to subtly different behavior on different file systems. This, in turn, can cause application misbehavior that is hard to track down. All currently available interfaces follow a fixed approach regarding semantics, making them only suitable for a subset of use cases and workloads. While the interfaces do not allow application developers to influence the I/O semantics, applications could benefit greatly from the possibility of being able to adapt them to their requirements. The work presented in this thesis includes the design of a novel I/O interface called JULEA. It offers support for dynamically adaptable semantics and is suited specifically for HPC applications. The introduced concept allows applications to adapt the file system behavior to their exact I/O requirements instead of the other way around. The general goal is an interface that allows developers to specify what operations should do and how they should behave - leaving the actual realization and possible optimizations to the underlying file system. Due to the unique requirements of the proposed interface, a prototypical file system is designed and developed from scratch. The new I/O interface and file system prototype are evaluated using both synthetic benchmarks and real-world applications. This ensures covering both specific optimizations made possible by the file system's additional knowledge as well as the applicability for existing software. Overall, JULEA provides data and metadata performance comparable to that of other established parallel distributed file systems. However, in contrast to the existing solutions, its flexible semantics allows it to cover a wider range of use cases in an efficient way. The results demonstrate that there is need for I/O interfaces that can adapt to the requirements of applications. Even though POSIX facilitates portability, it does not seem to be suited for contemporary HPC demands. JULEA presents a first approach of how application-provided semantical information can be used to dynamically adapt the file system's behavior to the applications' I/O requirements.

Thesis BibTeX URL

Adaptive Compression for the Zettabyte File System

AuthorFlorian Ehmke
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn
Date2015-02-24
AbstractAlthough many file systems nowadays support compression, lots of data is still written to disks uncompressed. The reason for this is the overhead created when compressing the data, a CPU-intensive task. Storing uncompressed data is expensive as it requires more disks which have to be purchased and subsequently consume more energy. Recent advances in compression algorithms yielded compression algorithms that meet all requirements for a compression-by-default scenario (LZ4, LZJB). The new algorithms are so fast, that it is indeed faster to compress-and-write than to just write data uncompressed. However, algorithms such as gzip still yield much higher compression ratios at the cost of a higher overhead. In many use cases the compression speed is not as important as saving disk space. On an archive used for backups the (de-)compression speed does not matter as much as in a folder where some calculation stores intermediate results which will be used again in the next iteration of the calculation. Furthermore, algorithms may perform differently when compressing different data. The perfect solution would know what the user wants and choose the best algorithm for every file individually. The Zettabyte File System (ZFS) is a modern file system with built-in compression support. It supports four different compression algorithms by default (LZ4, LZJB, gzip and ZLE). ZFS already offers some flexibility regarding compression as different algorithms can be selected for different datasets (mountable, nested file systems). The major purpose of this thesis is to demonstrate how adaptive compression in the file system can be used to benefit from strong compression algorithms like gzip while avoiding, if possible, the performance penalties it brings along. Therefore, in the course of this thesis ZFS's compression capabilities will be extended to allow more flexibility when selecting a compression algorithm. The user will be able to choose a use case for a dataset such as archive, performance or energy. In addition to that two features will be implemented. The first feature will allow the user to select a compression algorithm for a specific file type and use case. File types will be identified by the extension of the file name. The second feature will regularly test blocks for compressibility with different algorithms. The winning algorithm of that test will be used until the next test is scheduled. Depending on the selected use case, parameters during the tests are weighted differently.

Thesis BibTeX

Optimization of non-contiguous MPI-I/O Operations

AuthorEnno David Zickler
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2015-01-29
AbstractHigh performance computing is an essential part for most science departments. The possibilities expand with increasing computing resources. Lately data storage becomes more and more important, but the development of storage devices can not keep up with processing units. Especially data rates and latencies are enhancing slowly, resulting in efficiency becoming an important topic of research. Programs using MPI provide the possibility to get more efficient by using more information about the file system. In this thesis, advanced algorithms for optimization of non-contiguous MPI-I/O operations are developed by considering well-known system specifications like data rate, latency, or block and stripe alignment, maximum buffer size or the impact of read-ahead-mechanisms. Access patterns combined with these parameters will lead to an adaptive data sieving for non-contiguous I/O operations.The parametrization can be done by machine learning concepts, which will provide the best parameters even for unknown access pattern. The result is a new library called NCT, which provides a view based access on non-contiguous data at a POSIX level. The access can be optimized by data sieving algorithms whose behavior could easily be modified due to the modular design of NCT. Existing data sieving algorithms were implemented and evaluated with this modular design. Hence, the user is able to create new advanced data sieving algorithms using any parameters he regards useful. The evaluation shows many possibilities for where such an algorithm improves the performance.

Thesis Presentation BibTeX

2014

Performance Evaluation of Data Encryption in File Systems -- Benchmarking ext4, ZFS, LUKS and eCryptfs

AuthorHajo Möller
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn, Konstantinos Chasapis
Date2014-12-16
AbstractIt has become important to reliably protect stored digital data, both against becoming inaccessible as well as becoming available to third parties. Using a file system which guarantees data integrity protects against data losses, disk encryption protects against data breaches. Encryption is still thought to incur a large performance penalty when accessing the data. This thesis evaluates different approaches to data encryption using low-power hardware and open-source software, with a focus on the advanced file system OpenZFS, which features excellent protection against data loss but does not include encryption. It is shown that encryption using LUKS beneath ZFS is a viable method of gaining data protection, especially when using hardwareaccelerated encryption algorithms. Using a low-power server CPU with native AES instructions, ZFS as the file system and LUKS for encryption of the block device permits ensuring data integrity and protection at a low cost.

BibTeX

Implementierung und Leistungsanalyse numerischer Algorithmen zur Methode der kleinsten Quadrate

AuthorNiklas Behrmann
TypeBachelor's Thesis
AdvisorsPetra Nerge, Dr. Michael Kuhn
Date2014-12-16
AbstractDie Methode der kleinsten Quadrate gilt als Standardverfahren zur Lösung von Ausgleichsproblemen, wobei ein überbestimmtes Gleichungssystem gelöst wird, um damit die unbekannten Parameter einer Funktion möglichst genau zu bestimmen. In dieser Bachelorarbeit wird die Methode innerhalb der harmonischen Analyse von Gezeiten betrachtet. Hierzu steht ein Programm zur Verfügung, in dem die Methode der kleinsten Quadrate bisher mit Hilfe einer Bibliothek gelöst wird. Diese Arbeit zielt darauf ab, eine eigene Implementierung zu erarbeiten, da die vorhandene mithilfe von IBMs ESSL Bibliothek gelöst wird, welche nicht auf allen System zur Verfügung steht. Dazu werden im speziellen das Verfahren über die Gaußsche Normalengleichung mithilfe der Cholesky Zerlegung und die QR Zerlegung mithilfe des Householder Verfahrens betrachtet. Diese werden dann implementiert sowie unter der Verwendung der Lapack Softwarebibliothek in das bearbeitete Programm eingebaut. In der Leistungsanalyse zeigt sich, dass die Implementation mittels Cholesky Zerlegung, unter Beibehaltung der Ergebnisse, die besseren Laufzeiten erzielt.

BibTeX

Einsatz von Beschleunigerkarten für das Postprocessing großer Datensätze

AuthorJannek Squar
TypeBachelor's Thesis
AdvisorsPetra Nerge, Dr. Michael Kuhn
Date2014-12-03
AbstractDiese Bachelorarbeit beschäftigt sich mit der Frage, ob der Einsatz von Beschleunigerkarten einen Vorteil beim Postprocessing großer Datensätze mit sich bringt. Diese Fragestellung wird anhand einer Xeon-Phi-Karte und dem Programm Harmonic Analysis untersucht, welches auf den Ausgabedaten einer Ozeansimulation eine harmonische Analyse durchführt. Zunächst werden die herausragenden Merkmale und unterschiedlichen Betriebsmodi - der native Modus, der Offload-Modus und der symmetrische Modus - der Xeon-Phi-Karte vorgestellt; auch der Aufbau von Harmonic Analysis wird näher beschrieben, um Einsatzmöglichkeiten der Xeon-Phi-Karte zu klären. Dabei zeichnen sich erste Probleme ab, da die verschiedenen Betriebsmodi Anpassungen an verwendeten Bibliotheken erforderlich machen. Harmonic Analysis wird dann zunächst so überarbeitet, dass Teile des Programms im Offload-Modus auf die Karte geladen und dort ausgeführt werden, außerdem wird die Möglichkeit der Vektorisierung geprüft, da die Kerne der Xeon-Phi-Karte jeweils über eine große Vektoreinheit verfügen. Bei der Leistungsanalyse wird die Programmlaufzeit für unterschiedliche Startparameter verglichen, im Endeffekt muss aber festgestellt werden, dass sich die Verwendung der Xeon-Phi-Karte für Harmonic Analysis nicht rentiert hat, da die erzielte Leistung hinsichtlich Effizienz, Kosten und absoluter Verbesserung der Laufzeit bei Einsatz der Xeon-Phi-Karte schlechter ist, als wenn sie nicht verwendet wird. Da aber im Rahmen dieser Bachelorarbeit noch nicht alle Möglichkeiten ausgereizt worden sind, werden noch mögliche Ansatzpunkte zur Weiterarbeit aufgeführt.

Thesis BibTeX URL

Comparison of kernel and user space file systems

AuthorKira Isabel Duwe
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2014-08-28
AbstractA file system is part of the operating system and defines an interface between OS and the computer's storage devices. It is used to control how the computer names, stores and basically organises the files and directories. Due to many different requirements, such as efficient usage of the storage, a grand variety of approaches arose. The most important ones are running in the kernel as this has been the only way for a long time. In 1994, developers came up with an idea which would allow mounting a file system in the user space. The FUSE (Filesystem in Userspace) project was started in 2004 and implemented in the Linux kernel by 2005. This provides the opportunity for a user to write an own file system without editing the kernel code and therefore avoid licence problems. Additionally, FUSE offers a stable library interface. It is originally implemented as a loadable kernel module. Due to its design, all operations have to pass through the kernel multiple times. The additional data transfer and the context switches are causing some overhead which will be analysed in this thesis. So, there will be a basic overview about on how exactly a file system operation takes place and which mount options for a FUSE-based system result in a better performance. Therefore, an overview is given on how the related operating system internals work as well as a detailed presentation of the kernel file systems mechanisms such as the system call. Thereby a comparison of kernel file systems, such as tmpfs and ZFS, and user space file systems, such as memfs and ZFS-FUSE is enabled. This thesis shows the kernel version 3.16 offers great improvements for every file system analysed. The meta data operations even of a file system like tmpfs raised by a maximum of 25%. Increasing the writing performance of memfs from about 220 MB/s to 2 600 MB/s, the write-back cache has an enormous impact with a factor of 12. All in all, the performance of the FUSE-based file systems improved dramatically, transforming user space file systems in an alternative for native kernel file systems altough they still can not keep up in every aspect.

Thesis BibTeX URL

Optimization and parallelization of the post-processing of a tidal simulation

AuthorDominik Rupp
TypeBachelor's Thesis
AdvisorsPetra Nerge, Dr. Michael Kuhn
Date2014-04-25
AbstractThe fields of oceanography and climate simulations in general strongly rely on information technology. High performance computing provides the hard and software needed to process complex climate computations. An employee of the work group Scientific Computing of the University of Hamburg implemented an application that does a post-processing using input data from a simulation of global ocean tides. The post-processing gains further insight on that data by performing complex calculations. It uses large NetCDF input files to execute a demanding harmonic analysis and to finally produce visualizable output. It is to be analyzed and evaluated for its suitability of use on a cluster computer. This is achieved by examining the program with tracing tools and finding routines that exhibit great potential of parallel execution. Also an initial estimate regarding the program's maximum speed-up is determined by applying Amdahl's law. Further, a parallelization approach has to be chosen and implemented. The results are analyzed and compared to prior expectations and evaluations. It turns out that by implementing a hybrid parallelization that speeds up calculations and input/output using OpenMP and MPI, a speed-up of 13.0 in comparison to the original serial program can be achieved on the cluster computer of the work group Scientific Computing. Finally, possible issues that result from this thesis are highlighted as future work. The mathematical background found in the appendix, discusses and differentiates the terms harmonic analysis and Fourier analysis that are strongly related to this thesis.

BibTeX

Halbautomatische Überprüfung von kollektiven MPI-Operationen zur Identifikation von Leistungsinkonsistenzen

AuthorSebastian Rothe
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2014-04-09
AbstractComputersimulationen werden heutzutage vermehrt dazu genutzt, wissenschaftliche Experimente in virtuellen Umgebungen auszuführen. Um die Ausführungsdauer zu re- duzieren, werden parallele Programme entwickelt, die auf Rechenclustern ausgeführt werden. Programme, die auf mehrere Computersysteme verteilt sind, nutzen meist den MPI-Standard (Message Passing Interface), um den Nachrichtenaustausch zwischen den Rechnern realisieren zu können. Aufgrund des komplexen Aufbaus der Rechencluster wird die verfügbare Hardware allerdings oftmals nicht ideal ausgenutzt. Es existiert damit Optimierungspotential, das genutzt werden kann, um die Laufzeit der Applikationen weiter zu verringern. Leistungsanalysen bilden hierbei die Basis, um Schwachstellen im System oder in den genutzten MPI-Implementationen aufzudecken und sie später zu optimieren. Diese Arbeit befasst sich mit der Entwicklung des Analysewerkzeugs pervm (performance validator for MPI), das sich auf die Untersuchung der kollektiven Operationen von MPI konzentriert und dadurch Leistungsinkonsistenzen aufdecken soll. Dafür werden theoretische Grundlagen genauer erläutert, die dann dazu verwendet werden, das Zusammenspiel der benötigten Komponenten des Analysewerkzeugs zu erklären. Die Ausführung von pervm lässt sich in die Mess- und die Auswertungsphase unterteilen. Es können die Ausführungszeiten der eigentlichen MPI-Operation sowie verschiedener Algorithmen, die unterschiedlich effiziente Ausführungsmöglichkeiten einer kollektiven Operation beschreiben, ermittelt werden. Neben der Analyse dieser Messergebnisse bietet die Auswertungsphase des Werkzeugs zusätzlich die Möglichkeit, die theoretische Ausführungsdauer eines Algorithmus auf einem gegebenen System anhand dessen Leistungswerte zu simulieren. Die beschriebenen Ausführungsmöglichkeiten liefern zahlreiche Ansätze zur Identifikation von Leistungsengpässen. Es wird gezeigt, inwiefern bei der Verwendung der kollektiven MPI-Operation Rückschlüsse auf den genutzten Algorithmus gezogen werden können. Referenzalgorithmen mit kürzeren Ausführungszeiten im Vergleich zur MPI-Operation liefern Hinweise auf weitere Inkonsistenzen in der Implementation der genutzten MPI-Bibliothek.

Thesis BibTeX

An in-depth analysis of parallel high level I/O interfaces using HDF5 and NetCDF-4

AuthorChristopher Bartz
TypeMaster's Thesis
AdvisorsKonstantinos Chasapis, Dr. Michael Kuhn, Petra Nerge
Date2014-04-07
AbstractScientific applications store data in various formats. HDF5 and NetCDF-4 are data formats which are widely used in the scientific community. They are surrounded by high-level I/O interfaces which provide retrieval and manipulation of data. The parallel execution of applications is a key factor regarding the performance. Previous evaluations have shown that high-level I/O interfaces such as NetCDF-4 and HDF5 can exhibit suboptimal I/O performance depending on the application's access patterns. In this thesis we investigate how the parallel versions of the HDF5 and NetCDF-4 interfaces are behaving when using Lustre as underlying parallel file system. The I/O is performed in a layered manner: NetCDF-4 uses HDF5 and HDF5 uses MPI-IO which itself uses POSIX to perform the I/O. To discover inefficiencies and bottlenecks, we analyse the complete I/O path while using different access patterns and I/O configurations. We use IOR for our analysis. IOR is a configurable benchmark that generates I/O patterns and is well known in the parallel I/O community. In this thesis we modify IOR in order to fulfil our needs for analysis purposes. We distinguish between two general access patterns for our evaluation: disjoint and interleaved. Disjoint means that each process accesses a contiguous region in the file, whereas interleaved is an access to a non-contiguous region. The results show that neither the disjoint nor the interleaved access outperforms the other in every case. But when using the interleaved access in a certain configuration, results near the theoretical maximum are realised. We provide best practices for choosing the right I/O configuration depending on the need of application in the last chapter. The NetCDF-4 interface does not provide the feature to align the data section to particular address boundaries. This is a significant disadvantage regarding the performance. We provide an implementation and reevaluation for this feature and observe perspicuous performance improvement. When using NetCDF-4 or HDF5, the data can be broken into pieces called chunks which are stored in independent locations in the file. We present and evaluate an optimised implementation for determining the default chunk size in the NetCDF-4 interface. Beyond that, we reveal an error in the NetCDF-4 implementation and provide the correct solution.

Thesis BibTeX

Analyse und Optimierung von nicht-zusammenhängende Ein-/Ausgabe in MPI

AuthorDaniel Schmidtke
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel, Michaela Zimmer
Date2014-04-07
AbstractDas Ziel dieser Arbeit ist es, das Potential von Datasieving zu evaluieren und in Optimierungen nutzbar zu machen. Dazu werden die folgenden Ziele definiert. 1. Systematische Analyse der erzielbaren Leistung. 2. Transparente Optimierung. 3. Kontextsensitive Optimierung.

Thesis BibTeX

Automatic Analysis of a Supercomputer's Topology and Performance Characteristics

AuthorAlexander Bufe
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2014-03-18
AbstractAlthough knowing the topology and performance characteristics of a supercomputer is very important as it allows for optimisations and helps to detect bottleneck, no universal tool to determine topology and performance characteristic is available yet. Existing tools are often specialised to analyse either the behaviour of a node or of the network topology. Furthermore, existing tools are unable to detect switches despite their importance. This thesis introduces an universal method to determine the topology (including switches) and an efficient way to measure the performance characteristics of the connections. The approach of the developed tool is to measure the latencies first and then to compute the topology by analysing the data. In the next step, the gained knowledge of the topology is used to parallelise the measurement of the throughput in order to decrease the required time or to allow for more accurate measurements. A general approach to calculate latencies of connections that cannot be measured directly based on linear regression is introduced, too. At last, the developed algorithm and measurement techniques are validated on several test cases and a perspective of future work is given.

Thesis BibTeX

Flexible Event Imitation Engine for Parallel Workloads

AuthorJakob Lüttgau
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2014-03-18
AbstractEvaluating systems and optimizing applications in high-performance computing (HPC) is a tedious task. Trace files, which are already commonly used to analyse and tune applications, also serve as a good approximation to reproduce workloads of scientific applications. The thesis presents design considerations and discusses a prototype implementation for a flexible tool to mimic the behavior of parallel applications by replaying trace files. In the end it is shown that a plugin based replay engine is able to replay parallel workloads that use MPI and POSIX I/O. It is further demonstrated how automatic trace manipulation in combination with the replay engine allows to be used as a virtual lab.

Thesis BibTeX

2013

Design, Implementation, and Evaluation of a Low-Level Extent-Based Object Store

AuthorSandra Schröder
TypeMaster's Thesis
AdvisorsDr. Michael Kuhn
Date2013-12-18
AbstractAn object store is a low-level abstraction of storage. Instead of providing a block-level view of a storage device, an object store allows to access it by a more abstract way, namely via objects. Being on top of a storage device, it is responsible for storage management. It can be used as a stand-alone light-weight file system when only basic storage management is necessary. Moreover it can act as a supporting layer for full-featured file systems in order to lighten their management overhead. Only a few object store solutions exist. These object stores are, however, not suitable for these use cases. For example no user interface is provided or it is too difficult to use. The development of some object stores has ceased, so that the code of the implementation is not available anymore. That is why a new object store is needed facing those problems. In this thesis a low-level and extent-based object store is designed and implemented. It is able to perform fully-functional storage management. For this, appropriate data structures are designed, for example, so-called inodes and extents. These are file system concepts that are adapted to the object store design and implementation. The object store uses the memory mapping technique for memory management. This technique maps a device into the virtual address space of a process, promising efficient access to the data. An application programming interface is designed allowing easy use and integration of the object store. This interface provides two features, namely synchronization and transactions. Transactions allow to batch several input/output requests into one operation. Synchronization ensures that data is written immediately after the write request. The object store implementation is object-oriented. Each data structure constitutes a programming unit consisting of a set of data types and methods. The performance of the object store is evaluated and compared with well-known file systems. It shows excellent performance results, although it constitutes a prototype. The transaction feature is found to be efficient. It increases the write performance by a factor of 50 when synchronization of data is activated. It especially outperforms the other file systems concerning metadata performance. A high metadata performance is a crucial criterion when the object store is used as a supporting storage layer in the context of parallel file systems.

Thesis BibTeX

Automated File System Correctness and Performance Regression Tests

AuthorAnna Fuchs
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2013-09-23
AbstractTo successfully manage big software projects with lots of developers involved, every developing step should be continuously verified. Automated and integrated test procedures remove much effort, reduce risk of error rate and enable a much more efficient development process, since the effects of every development step are continuously available. In this thesis the testing and analyzing of a parallel file system JULEA is automated. With it all these processes are integrated and linked to the version control system Git. Every checked change triggers a test run. Therefore the concept of Git hooks is used, by me of which it is possible to include testing to the common develop workflow. Especially scientific projects suffer from lack of careful and qualitative test mechanisms. In the course of this not only correctness is relevant, which forms the base for any further changes, but also the performance trend. A significant criterion of a quality of a parallel file system is its efficiency. Performance regression of a system caused by made changes can crucially affect the further development course. Hence it is important to draw conclusions about the temporal behavior instantaneously after every considerable develop step. The trends and results have to be evaluated and analyzed carefully. The best way to recognize this kind of information is the graphical way. The goal is to generate simple but meaningful graphics out of test results, which would help along to improve the quality of the developing process and the final product. Ideally the visualization would be available on a web site for more comfortable use. Moreover to abstract from the certain project, the test system is portable and universal enough to be integrated it in any project versioned with Git. Finally, some tests in need of improvement were located using this framework.

Thesis BibTeX URL

Evaluating Distributed Database Systems for Use in File Systems

AuthorRoman Michel
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2013-09-18
AbstractAfter a short introduction to NoSQL and file systems, this thesis will look at various of the popular NoSQL database management systems as well as some younger approaches, opposing to their similarities and dissimilarities. Based on those analogies and recent developments, a selection of those databases will be analysed further, with the focus on scalability and performance. The main part of this analysis will be the setup of multiple instances of these databases, reviewing and comparing the setup process, as well as developing and evaluating benchmarks. The benchmarks will be focused on access patterns commonly found in file systems, which are to be found in the course of this thesis.

BibTeX

Simulation of Parallel Programs on Application and System Level

AuthorJulian Kunkel
TypePhD Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2013-07-30
AbstractComputer simulation revolutionizes traditional experimentation providing a virtual laboratory. The goal of high-performance computing is a fast execution of applications since this enables rapid experimentation. Performance of parallel applications can be improved by increasing either capability of hardware or execution efficiency. In order to increase utilization of hardware resources, a rich variety of optimization strategies is implemented in both hardware and software layers. The interactions of these strategies, however, result in very complex systems. This complexity makes assessing and understanding the measured performance of parallel applications in real systems exceedingly difficult.
To help in this task, in this thesis an innovative event-driven simulator for MPI-IO applications and underlying heterogeneous cluster computers is developed which can help us to assess measured performance. The simulator allows conducting MPI-IO application runs in silico, including the detailed simulations of collective communication patterns, parallel I/O and cluster hardware configurations. The simulation estimates the upper bounds for expected performance and therewith facilitates the evaluation of observed performance.
In addition to the simulator, the comprehensive tracing environment HDTrace is presented. HDTrace offers novel capabilities in analyzing parallel I/O. For example, it allows the internal behavior of MPI and the parallel file system PVFS to be traced. While PIOsimHD replays traced behavior of applications on arbitrary virtual cluster environments, in conjunction with HDTrace it is a powerful tool for localizing inefficiencies, conducting research on optimizations for communication algorithms, and evaluating arbitrary and future systems.
This thesis is organized according to a systematic methodology which aims at increasing insight into complex systems: The information provided in the background and related-work sections offers valuable analyses on parallel file systems, performance factors of parallel applications, the Message Passing Interface, the state-of-the-art in optimization and discrete-event simulation. The behavior of memory, network and I/O subsystem is assessed for our working group's cluster system, demonstrating the problems of characterizing hardware. One important insight of this analysis is that due to interactions between hardware characteristics and existing optimizations, performance does not follow common probability distributions, leading to unpredictable behavior of individual operations.
The hardware models developed for the simulator rely on just a handful of characteristics and implement only a few optimizations. However, a high accuracy of the developed models to explain real world phenomenons is demonstrated while performing a careful qualification and validation. Comprehensive experiments illustrate how simulation aids in localizing bottlenecks in parallel file system, MPI and hardware, and how it fosters understanding of system behavior. Additional experiments demonstrate the suitability of the novel tools for developing and evaluating alternative MPI and I/O algorithms. With its power to assess the performance of clusters running up to 1,000 processes, PIOsimHD serves as virtual laboratory for studying system internals.
In summary, the combination of the enhanced tracing environment and a novel simulator offers unprecedented insights into interactions between application, communication library, file system and hardware.

Thesis BibTeX URL

Design and Evaluation of Tool Extensions for Power Consumption Measurement in Parallel Systems

AuthorTimo Minartz
TypePhD Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2013-07-03
AbstractIn an effort to reduce the energy consumption of high performance computing centers, a number of new approaches have been developed in the last few years. One of these approaches is to switch hardware to lower power states in promising parallel application phases. A test cluster is designed with high performance computing nodes supporting multiple power saving mechanisms comparable to mobile devices. Each of the nodes is connected to power measurement equipment to investigates the power saving potential under different load scenarios of the specific hardware. However, statically switching the power saving mechanisms usually increases the application runtime. As a consequence, no energy can be saved. Contrary to static switching strategies, dynamic switching strategies consider the hardware usage in the application phases to switch between the different modes without increasing the application runtime. Even if the concepts are already quite clear, tools to identify application phases and to determine impact on performance, power and energy are still rare. This thesis designs and evaluates tool extensions for power consumption measurement in parallel systems with the final goal to characterize and identify energy-efficiency hot spots in scientific applications. Using offline tracing, the metrics are collected in trace files and can be visualized or post-processed after the application run. The timeline-based visualization tools Sunshot and Vampir are used to correlate parallel applications with the energy-related metrics. With these tracing and visualization capabilities, it is possible to evaluate the quality of energy-saving mechanisms, since waiting times in the application can be related to hardware power states. Using the energy-efficiency benchmark eeMark, typical hardware usage pattern are identified to characterize the workload, the impact on the node power consumption and finally the potential for energy saving. To exploit the developed extensions, four scientific applications are analyzed to evaluate the whole approach. Appropriate phases of the parallel applications are manually instrumented to reduce the power consumption with the final goal of saving energy for the whole application run on the test cluster. This thesis provides a software interface for the efficient management of the power saving modes per compute node to be exploited by application programmers. All analyzed applications consist of several, different calculation-intensive compute phases and have a considerable power and energy-saving potential which cannot be exhausted by traditional, utilization-based mechanisms implemented in the operating system. Reducing the processor frequency in communication and I/O phases can also gain remarkable savings for the presented applications.

Thesis BibTeX URL

Evaluation of Different Storage Backends and Technologies for MongoDB

AuthorJohann Weging
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2013-02-28
AbstractToday's data base management systems store their data in conventional file systems. Some of them allocate files of the size of multiple gigabyte and handle the data alignment by them self. In theory these data base management systems can work with just contiguous space of memory for their data base files. This these attempt to reduce the over head produces by file operations, by implementing a object store back end for a data base management system. The reference software used in this thesis is MongoDB for data base management system, JZFS for the object which works on top on the ZFS file system. Unfortunately while developing the new storage back end it was discovered that this implementation is to extensive for a bachelor thesis. The development is documented and shown up until this point. Further work that has to be done is finishing the storage back end for MongoDB and evaluate it. The main questing is if a object store is really capable of reducing the I/O overhead of MongoDB. This thesis covers two parts. Fist the implementation of a object store storage back end for MongoDB based on JZFS and ZFS. It makes the attempt to implement this storage solution but while developing the storage back end it was discovered that the implementation is to extensive for a bachelor thesis. The development is documented and shown up until this point. After the implementation was consider too extensive, the focus was moved towards file system benchmarking. The benchmarking is done by using the meta data benchmark mdtest. It covers the file systems ext4, XFS, btrfs and ZFS on different hardware setups. Every file system was benchmarked on a HDD and a SSD, in addition ZFS was benchmarked on a HDD using a SSD for read and write cache. It turns out that ZFS is still suffering some serious meta data performance bottlenecks. Surprising is that the HDD with the SSD cache performs nearly as good as ZFS on top of a pure SSD setup. Btrfs performs quit well, a odd thing about btrfs is that it in some cases performs better on the HDD than on the SSD and when creating files or directories it outperformed the other file systems by far. Ext4 doesn't seem to scale against multiple threads accessing shared data, the performance mostly stays the same or sometimes even drops. Only with two threads the performance increases at some operations. XFS performed quit well in most of the test cases, there was only one odd case when reading directory stats, one thread on the HDD was faster than one thread on the SDD, but when increasing the thread count on the HDD the performance drops rapidly. Further work at this point would be to identify the bottlenecks of ZFS which slows it down in every case except for file removal and directory removal.

BibTeX

2012

Effiziente Verarbeitung von Klimadaten mit ParStream

AuthorMoritz Lahn
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2012-06-28
AbstractIn Zusammenarbeit mit der ParStream GmbH wird in dieser Arbeit untersucht in wieweit sich die von ParStream entwickelte Datenbank zur effizienteren Verarbeitung von Klimadaten nutzen lässt. Für die Auswertung der Klimadaten verwenden Wissenschaftler oftmals das Climate Data Operators Programm (CDO). Das CDO Programm ist eine Sammlung von vielen Operatoren zur Auswertung von Daten die von Klimasimulationen bzw. Erd-System Modellen stammen. Die Auswertung mit diesem Programm ist sehr zeitintensiv. Dieser Ausgangspunkt begründet die Motivation zur Nutzung der ParStream Datenbank, die mit einem eigens entwickelten spaltenorientierten Bitmap Index und einer komprimierten Indexstruktur, Anfragen an eine große Datenbasis parallel und sehr effizient verarbeiten kann. Mit dem beschleunigten Abruf der Daten eröffnen sich neue Möglichkeiten im Bereich der Echtzeit-Analyse, die bei der interaktiven Visualisierung von Klimadaten hilfreich sind. Als Ergebnis dieser Arbeit wird untersucht welche CDO Operatoren mit der ParStream Datenbank umsetzbar sind. Einige Operatoren werden zu Demonstrationszwecken mit der ParStream Datenbank umgesetzt. Die Leistungsvorteile werden durch Tests verifiziert und zeigen eine effizientere Verarbeitung von Klimadaten mit der ParStream Datenbank. Es hat sich herausgestellt, dass ParStream bei einigen Operatoren die Ergebnisse zwischen 2x und 20x schneller ausliefern kann als das CDO Programm. Als ein weiteres Ergebnis stellte sich bei der Klassifizierung der CDO Operatoren heraus, dass die meisten Operationen direkt durch SQL abgebildet werden können.
Der Industriepartner stimmt einer Veröffentlichung des PDFs nicht zu.

BibTeX

Energy-Aware Instrumentation of Parallel MPI Applications

AuthorFlorian Ehmke
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Timo Minartz
Date2012-06-25
AbstractEnergy consumption in High Performance Computing has become a major topic. Thus various approaches to improve the performance per watt have been developed. One way is to instrument an application with instructions that change the idle and performance states of the hardware. The major purpose of this thesis is to demonstrate the potential savings by instrumenting parallel message passing applications. For successful instrumentation critical regions in terms of performance and power consumption have to be identified. Most scientific applications can be divided into phases that utilize different parts of the hardware. The goal is to conserve energy by switching the hardware to different states depending on the workload in a specific phase. To identify those phases two tracing tools are used. Two examples will be instrumented: a parallel earth simulation model written in Fortran and a parallel partial differential equation solver written in C. Instrumented applications should consume less energy but may also show a increase in runtime. It is discussed if it is worthwhile to make a compromise in that case. The applications are analyzed and instrumented on two x64 architectures. Differences concerning runtime and power consumption are investigated.

BibTeX

Replay Engine for Application Specific Workloads

AuthorJörn Ahlers
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel
Date2012-04-12
AbstractToday many tools exist which are related to the processing of workloads. All of these have their specific area where they are used. Despite their differences they also have functions regarding the creation and execution of workloads in common. To create a new tool it is always needed to implement all of these functions even when they were implemented before in another tool. In this thesis a framework is designed and implemented that allows replaying of application specific work-loads. This gets realized through a modular system which allows to use existing modules in the creation of new tools to reduce development work. Additionally a function is designed to generate parts of the modules by their function headers to further reduce this work. To improve the generation, semantical information can be added through comments to add advanced behavior. To see that this approach is working examples are given which show the functionality and evaluate the overhead created through the software. Finally additional work that can be done to further improve this tool is shown.

Thesis BibTeX

2011

Evaluation of File Systems and I/O Optimization Techniques in High Performance Computing

AuthorChristina Janssen
TypeBachelor's Thesis
AdvisorsDr. Michael Kuhn
Date2011-12-05
AbstractHigh performance computers are able to process huge datasets in a short period of time by allowing work to be done on many computer nodes concurrently. This workload often poses several challenges to the underlying storage devices. When possibly hundreds of clients from multiple nodes try to acces the same files, those storage devices become bottlenecks and are therefore a threat to performance. In order to make I/O as efficient as possible, it is important to make the best use out of the given resources in a system. The I/O performance that can be achieved in a system results from a cooperation of several factors: the underlying file system, the interface that connects application and file system, and the implementation. Based on how well all of these factors work together, the best I/O performance can be achieved. In this thesis, an overview will be given of how different file systems work, what access semantics and I/O interfaces there are and how a cooperation of these, in addition to the use of ideal I/O optimization techniques, can result in best possible performance.

Thesis BibTeX URL

Energieeffizienz und Nachhaltigkeit im Hochleistungsrechnen am Beispiel des Deutschen Klimarechenzentrums

AuthorYavuz Selim Cetinkaya
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Timo Minartz
Date2011-11-21

BibTeX

Estimation of Power Consumption of DVFS-Enabled Processors

AuthorChristian Seyda
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Timo Minartz
Date2011-03-28
AbstractSaving energy is nowadays a critical factor, especially for data centers or high performance clusters which have a power consumption of several mega watts. The simple use of component energy saving mechanisms is not always possible, because this can lead to performance degradation. For this reason, high performance clusters are mainly not using them, even in low utilization phases. Modelling the power consumption of a component based on specific recordable values can help to find ways of saving energy or predicting the power consumption after replacing the component with a more efficient one. One of the main power consumer on a recent system is the processor. This thesis presents a model of the power consumption of a processor based on its frequency and voltage. Comparisons with real world power consumption were done to evaluate the model. Furthermore a tracing library was extended to be able to log the processor frequency and idle states if available. Using the presented model and the trace files, a power estimator has been implemented being able to estimate the power consumption of the processor in the given trace file—or a more energy efficient processor—helping to motivate the usage of power saving mechanisms and energy efficient processors, and showing the long term potential for energy saving.

BibTeX

2010

Crossmedia File System MetaFS -- Exploiting Performance Characteristics from Flash Storage and HDD

AuthorLeszek Kattinger
TypeBachelor's Thesis
AdvisorsDr. Julian Kunkel, Olga Mordvinova
Date2010-03-23
AbstractUntil recently, the decision which storage device is most suitable, in aspects of costs, capacity, performance and reliability has been an easy choice. Only hard disk devices offered requested properties. Nowadays rapid development of flash storage technology, makes these devices competitive or even more attractive. The great advantage of flash storage is, apart from lower energy consumption and insensitivity against mechanical shocks, the much lower access time. Compared with hard disks, flash devices can access data about a hundred times faster. This feature enables a significant performance benefit for random I/O operations. Unfortunately, the situation at present is that HDDs provide a much bigger capacity at considerable lower prices than flash storage devices, and this fact does not seem to be changing in the near future.Considering also the wide-spread use of HDDs, the continuing increase of storage density and the associated increase of sequential I/O performance, the incentive to use HDDs will continue. For this reason, a way to combine both storage technologies seems beneficial. From the view of a file system, meta data is often accessed randomly and very small, in contrast a logical file might be large and is often accessed sequentially. Therefore, in this thesis a file system is designed and implemented which places meta data on an USB-flash device and data on an HDD. The design also considers, how meta data operations can be optimized for a cheep low-end USB flash device, which provide flash media like fast access times but also characteristic low write rates, caused by the block-wise erase-before-write operating principle. All measured file systems show a performance drop for meta data updates on this kind of flash devices, compared with their behavior on HDD. Therefore the design focused on the possibility to map coherent logical name space structures (directories) close to physical media characteristics (blocks). To also check impacts by writes of data sizes equal or smaller then the selected block size, the option to write only full blocks or current block fill rates was given. The file system was implemented in the user space and operate through the FUSE interface. Despite of the overhead caused by this fact, the performance of write associated meta data operations (like create/remove) was better or equal than of those file systems used for benchmark comparison.

Thesis BibTeX Sources

2009

Analyzing Metadata Performance in Distributed File Systems

AuthorChristoph Biardzki
TypePhD Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2009-01-19

Thesis BibTeX URL

Tracing Internal Behavior in PVFS

AuthorTien Duc Tien
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2009-10-05
AbstractNowadays scientific computations are often performed on large cluster systems because of the high performance they deliver. In such systems there are many reasons for bottlenecks which are related to both hardware and software. This thesis defines and implements metrics and information used for tracing events in MPI applications in conjunction with the parallel file system PVFS in order to localize bottlenecks and determine system behavior. They are useful for the optimizations of the system or applications. After tracing, data is stored in trace files and can be analyzed via the visualization tool Sunshot.
There are two experiments made in this thesis. The first experiment is made on a balanced system. In this case Sunshot shows a balanced visualization between nodes, i.e. the load between nodes looks similar. Moreover, in connection with this experiment the new metrics and tracing information or characteristics are discussed in detail in Sunshot. In contrast, the second experiment is made on an unbalanced system. In this case Sunshot shows where bottlenecks occurred and components which are related.

Thesis BibTeX

Simulation-Aided Performance Evaluation of Input/Output Optimizations for Distributed Systems

AuthorMichael Kuhn
TypeMaster's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2009-09-30

Thesis BibTeX URL

Design and Implementation of a Profiling Environment for Trace Based Analysis of Energy Efficiency Benchmarks in High Performance Computing

AuthorStephan Krempel
TypeMaster's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2009-08-31

Thesis BibTeX

Model and simulation of power consumption and power saving potential of energy efficient cluster hardware

AuthorTimo Minartz
TypeMaster's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2009-08-27

Thesis BibTeX URL

2008

Ergebnisvisualisierung paralleler Ein/Ausgabe Simulation im Hochleistungsrechnen

AuthorAnton Ruff
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2008-05-31

BibTeX

2007

Container-Archiv-Format für wahlfreien effizienten Zugriff auf Dateien

AuthorHendrik Heinrich
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2007-09-30

Thesis BibTeX

Directory-Based Metadata Optimizations for Small Files in PVFS

AuthorMichael Kuhn
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2007-09-03

Thesis BibTeX URL

Towards Automatic Load Balancing of a Parallel File System with Subfile Based Migration

AuthorJulian Kunkel
TypeMaster's Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2007-08-02

Thesis BibTeX URL

Benchmarking of Non-Blocking Input/Output on Compute Clusters

AuthorDavid Büttner
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig, Dr. Julian Kunkel
Date2007-04-24

Thesis BibTeX URL

2006

Tracing the Connections Between MPI-IO Calls and their Corresponding PVFS2 Disk Operations

AuthorStephan Krempel
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2006-03-29

Thesis BibTeX URL

Performance Analysis of the PVFS2 Persistency Layer

AuthorJulian Kunkel
TypeBachelor's Thesis
AdvisorsProf. Dr. Thomas Ludwig
Date2006-02-15

Thesis BibTeX URL

2012

Parameterising primary production and convection in a 3D model

AuthorFabian Große
TypeDiploma Thesis
AdvisorsJan O. Backhaus, Johannes Pätsch
Date2012-05-16

BibTeX