A Simple Language Classifier
Published in
1 min readJan 6, 2017
This is a toy project I worked on over the summer. I wanted to see if I could make a predictive model that acts as a language detector, such as the algorithm found on Facebook which automatically detects what language a post is written in.
I used Python to scrape text from Wikipedia in several different languages, and then exported the corpus to R to create a statistical model which predicts what language a text is based on its distribution of letters. Shockingly, it worked.
The source code can be found here: https://github.com/thenasfarce/LanguageClassifier