SRBench - A Benchmark For Streaming RDF Storage Engines / Benchmarking Linked Open Data technology

Authors: 
Ying Zhang, Peter Boncz
Year: 
2012
Presentation Date: 
Thursday, 7 June, 2012
Presented at: 
European Data Forum (EDF) 2012

In this talk, we present SRBench, the first benchmark for Streaming RDF Storage Engines, which is completely based on real-world datasets. With the increasing problem of too much streaming data but not enough tools to gain and even derive knowledge from those data, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for the publishing, sharing, analysing and understanding of such data. Various approaches are emerging, , e.g., C-SPARQL, SPARQLStream, StreamSPARQL and CQELS. To help researchers and users to compare streaming RDF engines in a standardised application scenario, we propose SRBench, with which one can assess the abilities of a streaming RDF engine to cope with a broad range of use cases typically encountered in real-world scenarios. The design of SRBench is based on an extensive study of the state-of-the-art techniques in both the data stream management systems and the streaming RDF processing engines, and the existing RDF/SPARQL benchmarks. This ensures that we capture all important aspects of streaming RDF processing in the benchmark.
The first goal of SRBench is to evaluate the functional completeness of a streaming RDF engine. The benchmark contains a concise, yet comprehensive set of queries which covers the major aspects of streaming SPARQL query processing, ranging from simple pattern matching queries to queries with complex reasoning tasks. The main advantages of applying Semantic Web technologies on streaming data include providing better search facilities by adding semantics to the data, reasoning through ontologies, and integration with other data sets. The ability of a streaming RDF engine to process these distinctive features is accessed by the benchmark with queries that apply reasoning not only over the streaming sensor data, but also over the metadata and even other data sets in the Linked Open Data (LOD) cloud.
To give a first baseline and illustrate the state of the art, we show results obtained from implementing SRBench using the Polit cnica de Madrid (UPM). The engine supports the streaming RDF query language, also called SPARQLStream. The evaluation shows that the functionality supported by SPARQLStream is fairly complete. At the language level, it is able to express all benchmark queries easily and concisely. At the query processing level, some missing features have been discovered, for all of which preliminary code has been added for further development.