Publication details
- Suitability analysis of Object Storage for HPC workloads (Lars Thoms), Bachelor's Thesis, School: Universität Hamburg, 2017-03-23
Publication details
Abstract
This bachelor thesis reviews the possibility of using an Object Storage system like Ceph Object Storage (RADOS) especially about its performance and functionality of partial rewrite. Scientific high-performance computing produces large file objects and its metadata has to be fast searchable. That is why Object Storages are a good solution because they store data efficiently with simple API calls without the requirement to comply with POSIX specification. Unfortunately, these are overloaded and not performant. Above all, object storing in combination with metadata separation to store them in a search-efficient database will increase the performance of searching. Furthermore, per definition objects are supposed to be immutable, but if RADOS API calls are used, they are mutable and can be rewritten like on other filesystems. In this thesis, I am going to investigate whether that objects could be segmented rewritten. Accordingly, I am going to program a FUSE driver as a proof of concept and prepare a series of measurement to show performance and issues. Thereby, it is possible to use Ceph as normal Filesystem, because of mutable objects. Unfortunately, the write performance of this driver was low (around 3 MiB/s). At the end, there is a design concept of an HPC application using a Ceph cluster in combination with a document-oriented database to store metadata.
BibTeX
@misc{SAOOSFHWT17, author = {Lars Thoms}, title = {{Suitability analysis of Object Storage for HPC workloads}}, advisors = {Michael Kuhn}, year = {2017}, month = {03}, school = {Universität Hamburg}, howpublished = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:lars_thoms_suitability_analysis_of_object_storage_for_hpc_workloads.pdf}}}, type = {Bachelor's Thesis}, abstract = {This bachelor thesis reviews the possibility of using an Object Storage system like Ceph Object Storage (RADOS) especially about its performance and functionality of partial rewrite. Scientific high-performance computing produces large file objects and its metadata has to be fast searchable. That is why Object Storages are a good solution because they store data efficiently with simple API calls without the requirement to comply with POSIX specification. Unfortunately, these are overloaded and not performant. Above all, object storing in combination with metadata separation to store them in a search-efficient database will increase the performance of searching. Furthermore, per definition objects are supposed to be immutable, but if RADOS API calls are used, they are mutable and can be rewritten like on other filesystems. In this thesis, I am going to investigate whether that objects could be segmented rewritten. Accordingly, I am going to program a FUSE driver as a proof of concept and prepare a series of measurement to show performance and issues. Thereby, it is possible to use Ceph as normal Filesystem, because of mutable objects. Unfortunately, the write performance of this driver was low (around 3 MiB/s). At the end, there is a design concept of an HPC application using a Ceph cluster in combination with a document-oriented database to store metadata.}, }