Understanding Visual Information: from Unsupervised Discovery to Minimal Effort Domain Adaptation

Zen, Gloria (2015) Understanding Visual Information: from Unsupervised Discovery to Minimal Effort Domain Adaptation. PhD thesis, University of Trento.

PDF - Doctoral Thesis


Visual data interpretation is a fascinating problem which has received an increasing attention in the last decades. Reasons for this growing trend can be found within multiple interconnected factors, such as the exponential growth of visual data (e.g. images and videos) availability, the consequent demand for an automatic way to interpret these data and the increase of computational power. In a supervised machine learning approach, a large effort within the research community has been devoted to the collection of training samples to be provided to the learning system, resulting in the generation of very large scale datasets. This has lead to remarkable performance advances in tasks such as scene recognition or object detection, however, at a considerable high cost in terms of human labeling effort. In light of the labeling cost issue, together with the dataset bias one, another significant research direction was headed towards developing methods for learning without or with a limited amount of training data, by leveraging instead on data properties like intrinsic redundancy, time constancy or commonalities shared among different domains. Our work is in line with this last type of approach. In particular, by covering different case scenarios - from dynamic crowded scenes to facial expression analysis - we propose a novel approach to overcome some of the state-of-the-art limitations. Based on the renowned bag of words (BoW) approach, we propose a novel method which achieves higher performances in tasks such as learning typical patterns of behaviors and anomalies discovery from complex scenes, by considering the similarity among visual words in the learning phase. We also show that including sparsity constraints can help dealing with noise which is intrinsic to low level cues extracted from complex dynamic scenes. Facing the so called dataset bias issue, we propose a novel method for adapting a classifier to a new unseen target user without the need of acquiring additional labeled samples. We prove the effectiveness of this method in the context of facial expression analysis showing that our method achieves higher or comparable performance to the state of the art, at a drastically reduced time cost.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:26
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Uncontrolled Keywords:computer vision; unsupervised learning; domain adaptation; video surveillance; anomaly detection; earth mover's distance; convex learning; facial expression recognition; personalization
Repository Staff approval on:18 Jun 2015 09:18

Related URLs:

Repository Staff Only: item control page