Exploring Multi-Modal and Structured Representation Learning for Visual Image and Video Understanding

Xu, Dan (2018) Exploring Multi-Modal and Structured Representation Learning for Visual Image and Video Understanding. PhD thesis, University of Trento.

[img]PDF - Disclaimer
Restricted to Repository staff only until 9999.

1004Kb
[img]
Preview
PDF - Doctoral Thesis
Available under License Creative Commons Attribution.

18Mb

Abstract

As the explosive growth of the visual data, it is particularly important to develop intelligent visual understanding techniques for dealing with a large amount of data. Many efforts have been made in recent years to build highly effective and large-scale visual processing algorithms and systems. One of the core aspects in the research line is how to learn robust representations to better describe the data. In this thesis we study the problem of visual image and video understanding and specifically, we address the problem via designing and implementing novel multi-modal and structured representation learning approaches, both of which are fundamental research hot-spots in machine learning. Multi-modal representation learning involves relating information from multiple input sources, and the structured representation learning works on exploring rich structural information hidden in the data for robust feature learning. We investigate both the shallow representation learning frameworks such as dictionary learning and the deep representation learning frameworks such as deep neural networks, and present different modules devised in our works, consisting of cross-paced representation learning, cross-modal feature learning and transferring, multi-scale structured prediction and fusion, multi-modal prediction and distillation. These techniques are further applied in various visual understanding topics, i.e. sketch-based-image retrieval (SBIR), video pedestrian detection, monocular depth estimation and scene parsing, showing superior performance.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:30
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Repository Staff approval on:10 May 2018 09:48

Repository Staff Only: item control page