Real-Time Event Centric Data Integration

Ayyad, Majed (2014) Real-Time Event Centric Data Integration. PhD thesis, University of Trento.

PDF - Doctoral Thesis


A vital step in integrating data from multiple sources is detecting and handling duplicate records that refer to the same real-life entity. Events are spatio-temporal entities that reflect changes in real world and are received or captured from different sources (sensors, mobile phones, social network services, etc.). In many real world situations, detecting events mostly take place through multiple observations by different observers. The local view of the observer reflects only a partial knowledge with certain granularity of time and space. Observations occur at a particular place and time, however events which are inferred from observations, range over time and space. In this thesis, we address the problem of event matching, which is the task of detecting similar events in the recent past from their observations. We focus on detecting Hyperlocal events, which are an integral part of any dynamic human decision-making process and are useful for different multi-tier responding agencies such as emergency medical services, public safety and law enforcement agencies, organizations working on fusing news from different sources as well as for citizens. In an environment where continuous monitoring and processing is required, the matching task imposes different challenges. In particular, the matching task is decomposed into four separate tasks in which each requiring different computational method. The four tasks are: event-type similarity, similarity in location, similarity in time and thematic role similarity that handles participants similarity. We refer to the four tasks as local similarities. Then in addition, a global similarity measure combines the four tasks before being able to cluster and handle them in a robust near real-time system. We address the local similarity by studying thoroughly existing similarity measures and propose suitable similarity for each task. We utilize ideas from semantic web, qualitative spatial reasoning, fuzzy set and structural alignment similarities in order to define local similarity measures. Then we address the global similarity by treating the problem as a relational learning problem and use machine learning to learn the weights of each local similarity. To learn the weights, we combine the features of each pair of events into one object and use logistic regression and support vector machines to learn the weights. The learned weighted function is tested and evaluated on real dataset which is used to predict the similarity class of the new streamed event

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:26
Subjects:Area 02 - Scienze fisiche
Repository Staff approval on:07 Jan 2015 12:04

Repository Staff Only: item control page