Zeynep Yirmibeşoğlu Defended her MS Thesis

Morphologically Motivated Input Variations in Turkish-English Neural Machine Translation

Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Tremendous improvement in translation performance has been achieved with breakthroughs such as encoder-decoder networks, attention mechanism and Transformer architecture. However, the necessity of large amounts of parallel data for training an NMT system, and rare words in translation corpora are issues yet to be overcome. In this study, neural machine translation of the low-resource Turkish-English language pair is approached. State-of-the-art NMT architectures are employed and data augmentation methods that exploit monolingual corpora are used. The importance of input representation for the morphologically-rich Turkish language is pointed out, and a comprehensive analysis of linguistically and non-linguistically motivated input segmentation approaches has been made. Experiments on different input variations have proven the importance of morphologically motivated input segmentation for the Turkish language that carries a rich morphology. Moreover, superiority of the Transformer architecture over attentional encoder-decoder models has been shown for the Turkish-English language pair. Among the employed data augmentation approaches, back-translation has proven to be the most effective, and the benefit of increasing amount of parallel data on translation quality is confirmed. This thesis demonstrates a comprehensive analysis on NMT architectures with different hyperparameters, data augmentation methods and input representation techniques, and proposes ways of tackling the low-resource setting of Turkish-English NMT.

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461

Connect with us

We're on Social Networks. Follow us & get in touch.