Turkish Preprocessing Operations Using Deep Learning Approaches

Turkish Preprocessing Operations Using Deep Learning Approaches

The first step in nearly all natural language processing (NLP) applications is applying preprocessing operations to the text. Preprocessing operations include tokenization (segmenting the text into tokens), sentence splitting (dividing the text into sentences), normalization (converting the text into a canonical form), and the like. In this project, you will develop and implement algorithms for preprocessing of Turkish text using deep learning approaches. First, a literature review will be conducted and similar systems for English will be analyzed (e.g. UDPipe, Stanza). Then, deep learning models will be built for each of the preprocessing operations. The models will be adapted to Turkish based on the characteristics of the language (e.g. using embeddings for the suffixes). Finally, the system will be tested on Turkish corpora, probably on the Turkish treebanks in the UD (Universal Dependencies) framework. A conference or journal paper will be written towards the end of the project.

This is a 2-semesters project.

 

 

Project Members: 

Emrah Doğan

Project Advisor: 

Tunga Güngör

Project Status: 

Project Year: 

2021
  • Fall

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.