Automatic transliteration by YerevaNN
Many languages have their own non-Latin alphabets but the web is full of content in those languages written in Latin letters, which makes it inaccessible to various NLP tools (e.g. automatic translation). Transliteration is the process of converting the romanized text back to the original writing system. In theory every language has a strict set of romanization rules, but in practice people do not follow the rules and most of the romanized content is hard to transliterate using rule based algorithms. We believe this problem is solvable using the state of the art NLP tools, and we demonstrate a high quality solution for Armenian based on recurrent neural networks. We invite everyone to adapt our system for more languages.
Many chatbots have this transliteration problem due to message writers, so we are sharing the framework Github link.
YerevaNN is scientific educational foundation, which aims to promote world-class AI research in Armenia and develop high quality educational programs in machine learning and related disciplines. The core project of the foundation is to support an AI research lab based in Yerevan, Armenia. Inspired by OpenAI, the lab focuses on non-commercial machine learning research and is committed to publish all obtained results and release all the code on GitHub.