Building a word embeddings repository for Turkish

Building a word embeddings repository for Turkish

In this project, we aim at building a comprehensive word embedding repository for the Turkish language. Using each of the state-of-the-art word embedding methods, embeddings of all the words in the language will be formed using a corpus. First, the three commonly-used embedding methods (Word2Vec, Glove, Fasttext) will be used and an embedding dictionary for each one will be formed. Then we will continue with context-dependent embedding methods such as BERT and Elmo. Each method will be applied with varying parameters such as different corpora and different embedding dimensions. In this way, at the end of the project we will obtain an embedding repository for Turkish which will be quite useful for deep learning-based natural language processing applications.

This is a 2-semesters project.

 

Project Poster: 

Project Members: 

İhsan Mert Atalay

Project Advisor: 

Tunga Güngör

Project Status: 

Project Year: 

2021
  • Fall

Contact us

Department of Computer Engineering, Boğaziçi University,
34342 Bebek, Istanbul, Turkey

  • Phone: +90 212 359 45 23/24
  • Fax: +90 212 2872461
 

Connect with us

We're on Social Networks. Follow us & get in touch.