Dealing with Semantic Heterogeneity in Classifications

Maltese, Vincenzo (2012) Dealing with Semantic Heterogeneity in Classifications. PhD thesis, University of Trento.

PDF - Doctoral Thesis
Available under License Creative Commons Attribution Non-commercial No Derivatives.



Many projects have dealt with mappings between classifications both in computer science and digital library communities. The adopted solutions range from fully manual to fully automatic approaches. Manual approaches are very precise, but automation becomes unavoidable when classifications contain thousands of nodes with millions of candidate correspondences. As fun-damental preliminary step towards automation, S-Match converts classifications into formal on-tologies, i.e. lightweight ontologies. Despite many solutions to the problem have been offered, with S-Match representing a state of the art matcher with good accuracy and run-time perfor-mance, there are still several open problems. In particular, the problems addressed in this thesis include: (a) Run-time performance. Due to the high number of calls to the SAT reasoning engine, semantic matching may require exponential time; (b) Maintenance. Current matching tools offer poor support to users for the process of creation, validation and maintenance of the correspond-ences; (c) Lack of background knowledge. The lack of domain specific background knowledge is one important cause of low recall. As significant progress to (a) and (b), we describe MinSMatch, a semantic matching tool we developed evolving S-Match that computes the minimal mapping between two lightweight ontologies. The minimal mapping is that minimal subset of correspondences such that all the others can be efficiently computed from them and are therefore said to be redundant. We provide a formal definition of minimal and, dually, redundant map-pings, evidence of the fact that the minimal mapping always exists and it is unique and a correct and complete algorithm for computing it. Our experiments demonstrate a substantial improve-ment in run-time. Based on this, we also developed a method to support users in the validation task that allows saving up to 99% of the time. We address problem (c) by creating and by making use of an extensible diversity-aware knowledge base providing a continuously growing quantity of properly organized knowledge. Our approach is centered on the fundamental notions of domain and context. Domains, developed by adapting the faceted approach from library science, are the main means by which diversity is captured and allow scaling as with them it is possible to add new knowledge as needed. Context allows a better disambiguation of the terms used and re-ducing the complexity of reasoning at run-time. As proof of the applicability of the approach, we developed the Space domain and applied it in the Semantic Geo-Catalogue (SGC) project.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Area 01 - Scienze matematiche e informatiche > MAT/01 LOGICA MATEMATICA
Uncontrolled Keywords:Semantic matching; minimal mappings; mapping validation; diversity-aware knowledge base; domains; context;
Repository Staff approval on:29 Mar 2012 11:59

Repository Staff Only: item control page