A semantically enabled architecture for crowdsourced Linked Data management
This paper is published in Proceedings of the First International Workshop on Crowdsourcing Web Search (CrowdSearch 2012) co-located with World Wide Web 2012 on the 17th of April 2012 in Lyon, France
Increasing amounts of structured data are exposed on the Web using graph-based representation models and protocols such as RDF and SPARQL. Nevertheless, while the overall volume of such open, or easily accessible, data sources reaches critical mass, the ability of potential consumers to use them in novel applications and services is predicated on the availability of purposeful means to query and manage the data, while taking into account and mastering its essential features in terms of decentralization, heterogeneity of schema, varying quality, and scale. Many aspects of these challenges are necessarily tackled through a combination of algorithmic techniques and manual effort. In the literature on traditional data management the theoretical and technical groundwork to realize and manage such combinations is being established. In this paper we build upon these ideas and propose a semantically enabled architecture for crowdsourced data management systems which uses formal representations of tasks and data to automatically design and optimize the operation and outcomes of human computation projects. The architecture is applied to the context of Linked Data management to address specific challenges of Linked Data query processing such as identity resolution and ontological classification. Starting from a motivational scenario we explain how query-processing tasks can be decomposed and translated into MTurk projects using our semantic approach, and roadmap the extensions to graph-based data management technology that are required to achieve this vision.