Motivation
The data volumes e-sciences are facing are already reaching petabyte scale and continue growing at exponential rates. A scalable, distributed infrastructure for data management and analysis is essential in such an environment.
In this project, we focus on the analysis of structured data, like trees or graphs, employing cloud computing techniques.
Research Topics
- Efficient, massive parallel processing on modern commodity systems
- Distributed mining of structured data (trees, graphs)
- Distributed similarity search
Current Status
One current research focus is a high level scripting interface allowing for easy and comfortable distributed tree processing. We put special emphasis on exploiting modern infrastructure like multi-core CPUs. Moreover, we aim to integrate optimization techniques from relational database systems as well as adaptive reoptimization of workflows during runtime.
Documents
- Benjamin Gufler, Jessica Müller, Tobias Scholl, Angelika Reiser, and Alfons Kemper
Scalable Scientific Data Processing
Annual Fall Meeting and 82nd General Assembly of the Astronomische Gesellschaft (AG 2009),
Splinter Meeting on eScience: New tools for research in Astronomy
September 21 - 25, Potsdam, Germany


