Boun NLP: A Morphological Analysis System for Turkish

Morphological analysis is a very important sub-task of natural language processing. It is used for tokenization, stemming, lemmatization and normalization. For the NLP task in which machine learning approach plays a crucial role, pre-processing the data is vital and the success rate is highly dependent on the pre-processing methodologies. This project proposes a tool for morphologically analyzing words in Turkish which is an agglutinative language in order to constitute a baseline for the further NLP projects.
A combination of rule-based and machine learning methods are utilized for the project. Data is gathered from TDK dictionary and around 22000 Turkish roots are generated. The algorithm exploits the dictionary and list of all suffixes of Turkish and proposes every possible parse. A finite state machine is implemented for the filtering phase to simulate the rules of Turkish language and FSM filters out the non-obeying parses.

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

Phone: +90 212 359 45 23/24
Fax: +90 212 2872461

Connect with us

We're on Social Networks. Follow us & get in touch.

About BOUN CmpE

Search form

Main Menu

Boun NLP: A Morphological Analysis System for Turkish