Theoretical and Algorithmic Solutions for Null models in Network Theory

Gobbi, Andrea (2013) Theoretical and Algorithmic Solutions for Null models in Network Theory. PhD thesis, University of Trento.

[img]
Preview
PDF - Doctoral Thesis
14Mb

Abstract

The graph-theoretical based formulation for the representation of the data-driven structure and the dynamics of complex systems is rapidly imposing as the paramount paradigm [1] across a variety of disciplines, from economics to neuroscience, with biological -omics as a major example. In this framework, the concept of Null Model borrowed from the statistical sciences identifies the elective strategy to obtain a baseline points of modelling comparison [2]. Hereafter, a null model is a graph which matches one specific graph in terms of some structural features, but which is otherwise taken to be generated as an instance of a random network. In this view, the network model introduced by Erdos & Renyi [3], where random edges are generated as independently and identically distributed Bernoulli trials, can be considered the simplest possible null model. In the following years, other null models have been developed in the framework of graph theory, with the detection of the community structure as one of the most important target[4]. In particular, the model described in [5] introduces the concept of a randomized version of the original graph: edges are rewired at random, with each expected vertex degree matching the degree of the vertex in the original graph. Although aimed at building a reference for the community detection, this approach will play a key role in one of the model considered in this thesis. Note that, although being the first problem to be considered, designing null models for the community structures detection is still an open problem [6, 7]. Real world applications of null model in graph theory have also gained popularity in many different scientific areas, with ecology as the first example: see [8] for a comprehensive overview. More recently, interest for network null models arose also in computational biology [9, 10], geosciences [11] and economics [12, 13], just to name a few. In the present work the theoretical design and the practical implementation of a series of algorithms for the construction of null models will be introduced, with applications ranging from functional genomics to game theory for social studies. The four chapters devoted to the presentation of the examples of null model are preceded by an introductory chapter including a quick overview of graph theory, together with all the required notations. The first null model is the topic of the second chapter, where a suite of novel algorithms is shown, aimed at the efficient generation of complex networks under different constraints on the node degrees. Although not the most important example in the thesis, the premiment position dedicated to this topic is due to its strict familiarity with the aforementioned classical null models for random graph construction. Together with the algorithms definition and examples, a thorough theoretical analysis of the proposed solutions is shown, highlighting the improvements with respect to the state-of-the-art and the occurring limitations. Apart from its intrinsic mathematical value, the interest for these algorithms by the community of systems biology lies in the need for benchmark graphs resembling the real biological networks. They are in fact of uttermost importance when testing novel inference methods, and as testbeds for the network reconstruction challenges such as the DREAM series [14, 15, 16]. The following Chapter three includes the most complex application of null models presented in this thesis. The scientific workfield is again functional genomics, namely the combinatorial approach to the modelling of patterns of mutations in cancer as detected by Next Generation Sequencing exome Data. This problem has a natural mathematical representation in terms of rewiring of bipartite networks and mutual-exclusively mutated modules [17, 18], to which Markov chain updates (switching-steps) are applied through a Switching Algorithm SA. Here we show some crucial improvements to the SA, we analytically derive an approximate lower bound for the number of steps required, we introduce BiRewire, an R package implementing the improved SA and we demonstrate the effectiveness of the novel solution on a breast cancer dataset. A novel threshold-selection method for the construction of co-expression net- works based on the Pearson coefficient is the third and last biological example of null model, and it is outlined in Chapter four. Gene co-expression networks inferred by correlation from high-throughput profiling such as microarray data represent a simple but effective technique for discovering and interpreting linear gene relationships. In the last years several approach have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is mostly crucial when the number of samples is small, yielding a non negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The theoretical derivation of the new bound by geometrical methods is shown together with two applications in oncogenomics. The last two chapters of the thesis are devoted to the presentation of null models in non-biological contexts. In Chapter 5 a novel dynamic simulation model is introduced mimicking a random market in which sellers and buyers follow different price distributions and matching functions. The random marked is mathematically formulated by a dynamic bipartite graph, and the analytical formula for the evolution along time of the mean price exchange is derived, together with global likelihood function for retrieving the initial parameters under different assumptions. Finally in Chapter 6 we describe how graph tools can be used to model abstraction and strategy (see [19, 20, 21]) for a class of games in particular the TTT solitaire. We show that in this solitaire it is not possible to build an optimal (in the sense of minimum number of moves) strategy dividing the big problems into smaller subproblems. Nevertheless, we find some subproblems and strategies for solving the TTT solitaire with a negligible increment in the number of moves. Although quite simple and far from simulating highly complex real-world situations of decision making, the TTT solitaire is an important tool for starting the exploration of the social analysis of the trajectories of the implementation of winning strategies through different learning procedures [22].

Item Type:Doctoral Thesis (PhD)
Doctoral School:Mathematics
PhD Cycle:XXVI
Subjects:Area 01 - Scienze matematiche e informatiche > MAT/05 ANALISI MATEMATICA
Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Area 01 - Scienze matematiche e informatiche > MAT/04 MATEMATICHE COMPLEMENTARI
Area 01 - Scienze matematiche e informatiche > MAT/06 PROBABILITÀ E STATISTICA MATEMATICA
Repository Staff approval on:19 Dec 2013 16:29

Repository Staff Only: item control page