Statistical and Relational Learning for Understanding Enzyme Function

Cilia, Elisa (2010) Statistical and Relational Learning for Understanding Enzyme Function. PhD thesis, University of Trento.

[img]
Preview
PDF - Doctoral Thesis
19Mb

Abstract

Unravelling the functioning of the complex processes involved in living systems is a challenging task. Enzymes are involved in almost all of the chemical processes taking place within the cell. They accelerate chemical reactions by forming a complex with the substrate and therefore lowering the reaction activation energy. The characterisation of the enzyme function at the molecular level is a fundamental step, which has several implications and applications in modern biotechnologies. This thesis investigates statistical and relational learning techniques for the characterisation of the enzyme function. The problem is tackled from two sides: the analysis of the enzyme structure and its interactions with other molecules, and the mining of relevant features from the enzyme mutation data. From the first side a pure statistical learning approach is proposed for directly predicting enzyme functional residues. This approach is shown to improve over the current state of the art on several benchmark datasets. The engineered predictors resulting from this investigation are now available to the public of researchers through the CatANalyst web server. Further improvement of the approach is pursued by proposing a supervised clustering technique for collectively predicting all the residues belonging to the same functional site. On the “learning from mutations” side, the focus shifts to the expressivity and interpretability of the learnt models. This thesis proposes novel statistical relational approaches for mining hierarchical features for multiple related tasks. The resistance of viral enzyme mutants to groups of related inhibitors is modelled in a multitask setting. Learnt models are refined on a group or per-task basis at different levels of the hierarchy. The proposed hierarchical approach is shown to provide statistically significant improvements over both single and multitask alternatives. Moreover it has the ability to provide explanation of the models which are themselves hierarchical. A task clustering approach is also proposed for inferring the structure of tasks when it is unknown. Finally, a relational approach is proposed for exploiting the learnt relational rules for generating novel mutations with specific characteristics. This allows to drastically reduce the space of possible mutations to be experimentally assessed. Promising preliminary results are obtained, which highlight the potential of the approach in guiding mutant engineering and in predicting the viral enzyme evolution. These findings can pave the way to further research directions in functional interpretation of biological data by means of machine learning techniques.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:XXII
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Area 05 - Scienze biologiche > BIO/11 BIOLOGIA MOLECOLARE
Uncontrolled Keywords:machine learning, bioinformatics, protein function identification, inductive logic programming, statistical relational learning, enzyme functional sites
Repository Staff approval on:03 Jan 2011 10:01

Related URLs:

Repository Staff Only: item control page