Data vaults: a database welcome to scientific file repositories
Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration. The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion. The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers.