mm delicious python.

A Simple Language Classifier

Michael Chimento
Source Filter
Published in
1 min readJan 6, 2017

--

This is a toy project I worked on over the summer. I wanted to see if I could make a predictive model that acts as a language detector, such as the algorithm found on Facebook which automatically detects what language a post is written in.

I used Python to scrape text from Wikipedia in several different languages, and then exported the corpus to R to create a statistical model which predicts what language a text is based on its distribution of letters. Shockingly, it worked.

The source code can be found here: https://github.com/thenasfarce/LanguageClassifier

--

--