Kefato, Zekarias Tilahun (2019) Network and Cascade Representation Learning: Algorithms based on Information Diffusion Events. PhD thesis, University of Trento.
|PDF - Disclaimer |
Restricted to Repository staff only until 9999.
|PDF - Doctoral Thesis |
Available under License Creative Commons Attribution.
Network representation learning (NRL) and cascade representation learn- ing (CRL) are fundamental backbones of different kinds of network analysis problems. They are usually carried out in settings where the structure of the network under consideration is known. Motivated by real-world prob- lems, this study presents several algorithms for scenarios where the network structure is partially or completely unknown. The objective of network representation learning is to identify a mapping function that projects sparse and high-dimensional network graphs into a dense latent representation, which preserves the original information about nodes and their neighborhoods. The notion of neighborhood, however, be- comes illusive when the network structure is partially or completely hidden. Inspired by previous results, in our thesis work we have developed novel algorithms that are resilient to such lack of knowledge. These results estab- lish a correlation between the properties of the network and different kind of node activities performed over it, information which is generally more available and can be easily observed. In particular, we focus on diffusion events – also called cascades – such as shares, retweets and hashtags. In the first of our contributions, we have developed a novel NRL algorithm called Mineral, a simple technique that combines the observed cascades with the partially accessible network structure by sampling artificial cas- cades. Node representation is then learned from the observed and sampled cascades by using the SkipGram model that is widely used for word representation learning in natural language documents. In our second contribution, called NetTensor, we assume that the network structure is completely hidden and we propose novel techniques that are capable to estimate both the hidden neighborhood (proximity) and the similarity of nodes. Such estimated values are then used to learn a unified embedding of nodes using a scalable truncated singular value decomposition and deep autoencoders. In addition to the NRL algorithms, we have also proposed a novel CRL algorithm called cas2vec for virality (popularity) prediction. Again, we pursue a network-agnostic approach following the above assumption that the network structure is completely unknown. Unlike prior studies that rely on manual feature extraction, cas2vec automatically learns cascade representations based on convolutional neural networks, that are effective in predicting virality of cascades. We have carried out extensive experiments using several real-world datasets for all of our methods and compared them against strong baselines from the state-of-the-art, achieving significantly better results than many of them.
|Item Type:||Doctoral Thesis (PhD)|
|Doctoral School:||Information and Communication Technology|
|Subjects:||Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA|
|Repository Staff approval on:||30 Apr 2019 10:51|
Repository Staff Only: item control page