Fossati, Marco (2017) Automatic Population of Structured Knowledge Bases via Natural Language Processing. PhD thesis, University of Trento.
| PDF - Doctoral Thesis Available under License Creative Commons Attribution Share Alike. 3022Kb | |
PDF - Disclaimer Restricted to Repository staff only until 9999. 297Kb |
Abstract
The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this thesis depicts a set of systems based on Natural Language Processing (NLP), which take as input unstructured or semi-structured information sources and produce machine-readable statements for a target KB. We implement four main research contributions: (1) a one-step methodology for crowdsourcing the Frame Semantics annotation; (2) a NLP technique implementing the above contribution to perform N-ary Relation Extraction from Wikipedia, thus enriching the target KB with properties; (3) a taxonomy learning strategy to produce an intuitive and exhaustive class hierarchy from the Wikipedia category graph, thus augmenting the target KB with classes; (4) a recommender system that leverages a KB network to yield atypical suggestions with detailed explanations, serving as a proof of work for real-world end users. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its public endpoint, and/or downloaded as standalone data dumps.
Item Type: | Doctoral Thesis (PhD) |
---|---|
Doctoral School: | Information and Communication Technology |
PhD Cycle: | 27 |
Subjects: | Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA |
Uncontrolled Keywords: | Natural Language Processing, Information Extraction, Machine Learning, Frame Semantics, Crowdsourcing, Recommender Systems, Wikipedia |
Funders: | Fondazione Bruno Kessler |
Repository Staff approval on: | 14 Apr 2017 10:07 |
Repository Staff Only: item control page