Abdelhamid Abdelnaby, Moaz Mohammed Reyad (2015) Provenance in Open Data Entity-Centric Aggregation. PhD thesis, University of Trento.
| PDF - Doctoral Thesis 3435Kb |
Abstract
An increasing number of web services these days require combining data from several data providers into an aggregated database. Usually this aggregation is based on the linked data approach. On the other hand, the entity-centric model is a promising data model that outperforms the linked data approach because it solves the lack of explicit semantics and the semantic heterogeneity problems. However, current open data which is available on the web as raw datasets can not be used in the entity-centric model before processing them with an import process to extract the data elements and insert them correctly in the aggregated entity-centric database. It is essential to certify the quality of these imported data elements, especially the background knowledge part which acts as input to semantic computations, because the quality of this part affects directly the quality of the web services which are built on top of it. Furthermore, the aggregation of entities and their attribute values from different sources raises three problems: the need to trace the source of each element, the need to trace the links between entities which can be considered equivalent and the need to handle possible conflicts between different values when they are imported from various data sources. In this thesis, we introduce a new model to certify the quality of a back ground knowledge base which separates linguistic and language independent elements. We also present a pipeline to import entities from open data repositories to add the missing implicit semantics and to eliminate the semantic heterogeneity. Finally, we show how to trace the source of attribute values coming from different data providers; how to choose a strategy for handling possible conflicts between these values; and how to keep the links between identical entities which represent the same real world entity.
Item Type: | Doctoral Thesis (PhD) |
---|---|
Doctoral School: | Information and Communication Technology |
PhD Cycle: | 26 |
Subjects: | Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA |
Uncontrolled Keywords: | provenance, entity, data aggregation, knowledge base |
Repository Staff approval on: | 13 May 2015 14:23 |
Repository Staff Only: item control page