Declarative Repairing Policies for Curated KBs
This paper was published at the 10th Hellenic Data Management Symposium (HDMS-11), 17-18 June 2011 in Athens, Greece.
Curated ontologies and semantic annotations are increasingly being used in e-science to reflect the current terminology and conceptualization of scientific domains. Such curated Knowledge Bases (KBs) are usually backended by relational databases using adequate schemas (generic or application/domain specific) and may satisfy a wide range of integrity constraints. As curated KBs continuously evolve, such constraints are often violated and thus KBs need to be frequently repaired. Motivated by the fact that consistency is mostly enforced manually by the scientists acting as curators, we propose a generic and personalized repairing framework for assisting them in this arduous task. Our framework supports a variety of useful integrity constraints using Disjunctive Embedded Dependencies (DEDs) as well as complex curator preferences over interesting features of the resulting repairs (e.g., their size and type) that can capture diverse notions of minimality in repairs. Moreover, we propose a novel exhaustive repair finding algorithm which, unlike existing greedy frameworks, is not sensitive to the resolution order and syntax of violated constraints and can correctly compute globally optimal repairs for different kinds of constraints and preferences. Despite its exponential nature, the performance and memory requirements of the exhaustive algorithm are experimentally demonstrated to be satisfactory for real world curation cases, thanks to a series of optimizations.