Automated Approaches to Community Question Answering

Uva, Antonio (2019) Automated Approaches to Community Question Answering. PhD thesis, University of Trento.

[img]PDF - Disclaimer
Restricted to Repository staff only until 9999.

PDF - Doctoral Thesis


Social Media applications, e.g., forums, social networks, allow users to pose questions about a given topic to a community of expert users. Although successful, these applications suffer from a major drawback: it is rather complex to find similar questions with traditional keyword-based search. Thus, Community Question Answering (cQA), a branch of QA, has been developed with the aim of automatically answering new user questions. Generally, cQA systems answer new user questions by (i) first looking at the questions most similar to the input question and (ii) selecting the best answer for the related question. Such systems require powerful machine learning algorithms that go beyond traditional approaches based on features. In recent years, tree kernels and neural networks have established as the state-of-the-art machine learning algorithms for solving such kinds of problems. Tree kernels are used to compute the similarity between two sentences encoded in form of trees that incorporate syntactic and semantic information. Neural networks map words into informative vectors called embeddings used to learn non-linear transformations of user inputs. In this work, we used these models for solving classification and ranking tasks needed to build automatic cQA systems. As a first step, we conceived structured input models able to automatically extract discriminative syntactic patterns for classifying relatedness between two questions. Then, we extended the previous work by presenting a new model for question similarity that combines semantic information of neural networks with structured information of tree kernels. We assess the performance of the new model on two tasks, i.e. question duplicate detection and question reranking, showing the advantages of injecting syntactic information in neural models. After that, we focus on more challenging tasks such as building a neural network architecture for ranking comments on a forum according to their relevance with respect to a new question. We show that neural models can benefit from being trained in multi-task learning setting, together with auxiliary tasks. This make possible to train cQA systems in an end-to-end fashion, which is convenient for industrial applications that needs to be easily deployed. Furthermore, we developed a novel intent detection model that combines state-of-the-art methods in relational text matching with the latest techniques in supervised clustering to make inference over a set of questions and automatically discover intent clusters. The latter can be used to quickly bootstrap Natural Language Understanding pipelines for dialog systems. To conclude, we study advantages and disadvantages of neural networks and tree kernel models when applied to cQA tasks. We show that neural networks perform effectively when data is abundant. Conversely,tree kernels are more suitable in presence of data scarcity.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:30
Subjects:Area 02 - Scienze fisiche
Repository Staff approval on:30 Apr 2019 10:10

Repository Staff Only: item control page