Learning from noisy data through robust feature selection, ensembles and simulation-based optimization

Mariello, Andrea (2019) Learning from noisy data through robust feature selection, ensembles and simulation-based optimization. PhD thesis, University of Trento.

[img]
Preview
PDF - Doctoral Thesis
1376Kb
[img]PDF - Disclaimer
Restricted to Repository staff only until 9999.

484Kb

Abstract

The presence of noise and uncertainty in real scenarios makes machine learning a challenging task. Acquisition errors or missing values can lead to models that do not generalize well on new data. Under-fitting and over-fitting can occur because of feature redundancy in high-dimensional problems as well as data scarcity. In these contexts the learning task can show difficulties in extracting relevant and stable information from noisy features or from a limited set of samples with high variance. In some extreme cases, the presence of only aggregated data instead of individual samples prevents the use of instance-based learning. In these contexts, parametric models can be learned through simulations to take into account the inherent stochastic nature of the processes involved. This dissertation includes contributions to different learning problems characterized by noise and uncertainty. In particular, we propose i) a novel approach for robust feature selection based on the neighborhood entropy, ii) an approach based on ensembles for robust salary prediction in the IT job market, and iii) a parametric simulation-based approach for dynamic pricing and what-if analyses in hotel revenue management when only aggregated data are available.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:31
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Area 09 - Ingegneria industriale e dell'informazione > ING-INF/05 SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
Repository Staff approval on:02 Apr 2019 11:00

Repository Staff Only: item control page