Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation

Bisazza, Arianna (2013) Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation. PhD thesis, University of Trento.

[img]
Preview
PDF - Doctoral Thesis
Available under License Creative Commons Attribution.

2567Kb

Abstract

Word reordering is one of the most difficult aspects of Statistical Machine Translation (SMT), and an important factor of its quality and efficiency. While short and medium-range reordering is reasonably handled by the phrase-based approach (PSMT), long-range reordering still represents a challenge for state-of-the-art PSMT systems. As a major cause of this problem, we point out the inadequacy of existing reordering constraints and models to cope with the reordering phenomena occurring between distant languages. On one hand, the reordering constraints used to control translation complexity appear to be too coarse-grained. On the other hand, the reordering models used to score different reordering decisions during translation are not discriminative enough to effectively guide the search over very large sets of hypotheses. In this thesis we propose several techniques to improve the definition of the reordering search space in PSMT by exploiting prior linguistic knowledge, so that long-range reordering may be adequately handled without sacrificing efficiency. In particular, we focus on Arabic-English and German-English: two language pairs characterized by uneven distributions of reordering phenomena, with long-range movements concentrating on few patterns. All our techniques aim at improving the definition of the reordering search space by exploiting prior linguistic knowledge, but they do this with different means: namely, chunk-based reordering rules and word reordering lattices, modified distortion matrices and early reordering pruning. Through extensive experiments, we show that our techniques can significantly advance the state of the art in PSMT for these challenging language pairs. When compared with a popoular tree-based SMT approach, our best PSMT systems achieve comparable or higher reordering accuracies while being considerably faster.

Item Type:Doctoral Thesis (PhD)
Doctoral School:Information and Communication Technology
PhD Cycle:XXV
Subjects:Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA
Uncontrolled Keywords:natural language processing; statistical machine translation; word reordering
Repository Staff approval on:10 Jun 2013 15:24

Repository Staff Only: item control page