Boun NLP: A Morphological Analysis System for Turkish

Boun NLP: A Morphological Analysis System for Turkish

Morphological analysis is a very important sub-task of natural language processing. It is used for tokenization, stemming, lemmatization and normalization. For the NLP task in which machine learning approach plays a crucial role, pre-processing the data is vital and the success rate is highly dependent on the pre-processing methodologies. This project proposes a tool for morphologically analyzing words in Turkish which is an agglutinative language in order to constitute a baseline for the further NLP projects.
A combination of rule-based and machine learning methods are utilized for the project. Data is gathered from TDK dictionary and around 22000 Turkish roots are generated. The algorithm exploits the dictionary and list of all suffixes of Turkish and proposes every possible parse. A finite state machine is implemented for the filtering phase to simulate the rules of Turkish language and FSM filters out the non-obeying parses.

Project Poster: 

Project Members: 

Erdem Toraman, Atakan Arıkan

Project Advisor: 

Arzucan Özgür

Project Status: 

Project Year: 

2016
  • Spring

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.