Publication details
- Database VOL-plugin for HDF5 (Olga Perevalova), Bachelor's Thesis, School: Universität Hamburg, 2017-07-05
Publication details
Abstract
HDF5 is an open source, hierarchical, and self-describing format for flexible and efficient I/O for high volume and complex data, that combines data and metadata. Advantages of this format make it widely used by many scientific applications. In a parallel HDF5 application when a large number of processes access a shared file simultaneously synchronization mechanism used by many file systems may significantly degrade I/O performance. Separation of metadata and data is the first step to solve this problem. The main contribution of this thesis is a prototype of an HDF5-VOL-Plugin that separates metadata and data. To this end, metadata are stored in an SQLite3 database and data in a shared file. It uses MPI for synchronization of metadata when several processes access the SQLite3 database. In the context of this work a benchmark test has been developed. It measures access times for each metadata operation and the overall I/O performance. The execution time of the Database VOL-plugin is compared to the native solution. The test results show that the database plugin consistently demonstrates good performance. The thesis concludes with a critical discussion of the approach by looking at the metadata from different perspectives: scientific applications vs. HDF5.
BibTeX
@misc{DVFHP17, author = {Olga Perevalova}, title = {{Database VOL-plugin for HDF5}}, advisors = {Michael Kuhn and Eugen Betke}, year = {2017}, month = {07}, school = {Universität Hamburg}, howpublished = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:olga_perevalova_database_vol_plugin_for_hdf5.pdf}}}, type = {Bachelor's Thesis}, abstract = {HDF5 is an open source, hierarchical, and self-describing format for flexible and efficient I/O for high volume and complex data, that combines data and metadata. Advantages of this format make it widely used by many scientific applications. In a parallel HDF5 application when a large number of processes access a shared file simultaneously synchronization mechanism used by many file systems may significantly degrade I/O performance. Separation of metadata and data is the first step to solve this problem. The main contribution of this thesis is a prototype of an HDF5-VOL-Plugin that separates metadata and data. To this end, metadata are stored in an SQLite3 database and data in a shared file. It uses MPI for synchronization of metadata when several processes access the SQLite3 database. In the context of this work a benchmark test has been developed. It measures access times for each metadata operation and the overall I/O performance. The execution time of the Database VOL-plugin is compared to the native solution. The test results show that the database plugin consistently demonstrates good performance. The thesis concludes with a critical discussion of the approach by looking at the metadata from different perspectives: scientific applications vs. HDF5.}, }