Language Detection for 150+ Languages

Felix Laumann
NeuralSpace
Published in
4 min readMay 13, 2022

What is language detection?

If the users of your application are multilingual, you naturally have the need to detect which language they are speaking or writing in. In Natural Language Processing (NLP), language detection is a computational approach to this problem. This helps you improve your user experience as well as pick language-specific AI models to process what users are saying or writing.

Let us take an example:

When you ask Amazon’s Alexa to perform an action, for example, you say “por favor pon algo de músic”, Alexa automatically detects the language which in this case is Spanish and responds appropriately.

Another example can be an email automation agent that can detect the language of the email and accordingly pick a language-specific AI model to process the email and classify it as spam or not spam.

Features of NeuralSpace’s Language Detection:

  • State-of-the-art Models: Use our pre-trained state-of-the-art language detection model through APIs and integrate them in any of your applications
  • Easy to Use: Simply pass the text through the API, and get top N predicted languages along with confidence scores.
  • Language Support: Over 150 languages

Use-Cases

Language Detection has a wide range of applications, below are a few use-cases:

#1 Language mixing

The mingling of languages. Some people are accustomed to conversing in a combination of two languages. A great example of this may be Hinglish, a blend of Hindi and English terms spoken in India. In such instances, a language detection model will look at the number of words in a sentence written in one or more languages, with the language with the most words becoming the primary language for the interaction but the secondary language will also be recognized and achieve a high confidence score in our ranking.

#2 Identify the language of business texts like emails and chats

Language detection determines a text’s language and the areas of the text where the language changes, all the way down to the word level. It’s generally used since business messages (chats, emails, and so on) can be written in a variety of languages.

#3 Monolingual chatbots

In case a bot is not trained enough to hold a conversation in a different language, it must be able to recognize that the user is speaking in a certain language and identify which one it is. The bot may respond with something like “Sorry, I don’t speak your language yet, but my colleagues are working on it” using a language identification technique.

Language Support

Assamese (as)

Bengali (bn)

Bihari (bh)

Bishnupriya (bpy)

Dotyali (dty)

Dhivehi (dv)

Goan Konkani (gom)

Gujarati (gu)

Hindi (hi)

Kannada (kn)

Malayalam (ml)

Maithili (mai)

Marathi (mr)

Nepali (ne)

Newari (new)

Odia (Oriya) (or)

Punjabi (pa)

Sindhi (sd)

Sinhala (Sinhalese) (si)

Tamil (ta)

Burmese (Myanmar) (my)

Cebuano (ceb)

Central Bikol (bcl)

Chavacano (cbk)

Hmong (hmn)

Iloko (ilo)

Indonesian (id)

Javanese (jv)

Khmer (km)

Malay (ms)

Minangkabau (min)

Loa (lo)

Pampanga (pam)

Sundanese (su)

Tagalog (Filipino) (tl)

Thai (th)

Urdu (ur)

Vietnamese (vi)

Waray (war)

Arabic (ar)

Central Bikol (bcl)

Egyptian Arabic (arz)

Hebrew (he)

Pashto (ps)

Persian (fa)

Uighur (ug)

Turkmen (tk)

Armenian (hy)

Azerbaijani (az)

Central Kurdish (ckb)

Chinese (Simplified) (zh-CN)

Chinese (Traditional) (zh-TW)

Fiji Hindi (hif)

Georgian (ka)

Japanese (ja)

Kalmyk (xal)

Karachay-Balkar (krc)

Kazakh (kk)

Kirghiz (ky)

Komi (kv)

Korean (ko)

Kurdish (ku)

Mazanderani (mzn)

Mingrelian (xmf)

Mongolian (mn)

Northern Luri (lrc)

Ossetian (os)

Pushto (ps)

Russian (ru)

Russia Buriat (bxr)

South Azerbaijani (azb)

Tajik (tg)

Tatar (tt)

Tibetan (bo)

Tuvinian (tyv)

Uzbek (uz)

Wu Chinese (wuu)

Yakut (sah)

Afrikaans (af)

Amharic (am)

English (en)

French (fr)

Hausa (ha)

Igbo (ig)

Kinyarwanda (rw)

Malagasy (mg)

Nyanja (Chichewa) (ny)

Sesotho (st)

Shona (sh)

Somali (so)

Swahili (sw)

Xhosa (xh)

Yoruba (yo)

Zulu (zu)

Albanian (sq)

Aragonese (an)

Asturian (ast)

Avaric (av)

Bashkir (ba)

Basque (eu)

Bavarian (bar)

Belarusian (be)

Bosnian (bs)

Breton (br)

Bulgarian (bg)

Catalan (ca)

Chechen (ce)

Chuvash (cv)

Cornish (kw)

Corsican (co)

Croatian (hr)

Czech (cs)

Danish (da)

Dutch (nl)

Eastern Mari (mhr)

Emiliano Romagnolo (eml)

Erzya (myv)

Esperanto (eo)

Estonian (et)

Finnish (fi)

French (fr)

Frisian (fy)

Galician (gl)

German (de)

Greek (el)

Hungarian (hu)

Icelandic (is)

Ido (io)

Irish (ga)

Italian (it)

Latin (la)

Latvian (lv)

Lezghian (lez)

Limburgan (li)

Lithuanian (lt)

Lombard (lmo)

Low German (nds)

Lower Sorbian (dsb)

Luxembourgish (lb)

Macedonian (mk)

Maltese (mt)

Manx (gv)

Mirandese (mwl)

Neapolitan (nap)

Northern Frisian (frr)

Norwegian (no)

Norwegian Bokmål (nb)

Norwegian Nynorsk (nn)

Occitan (oc)

Pfaelzisch (pfl)

Piemontese (pms)

Polish (pl)

Portuguese (pt)

Romanian (ro)

Romansh (rm)

Rusyn (rue)

Sardinian (sc)

Scots (sco)

Gaelic (gd)

Serbian (sr)

Serbo Croatian (sh)

Sicilian (scn)

Slovak (sk)

Slovenian (sl)

Spanish (es)

Swedish (sv)

Tosk Albanian (als)

Turkish (tr)

Ukrainian (uk)

Upper Sorbian (hsb)

Venetian (vec)

Veps (vep)

Vlaams (vls)

Volapük (vo)

Walloon (wa)

Welsh (cy)

Western Frisian (fy)

Western Mari (mrj)

Yiddish (yi)

Dutch (nl)

Guarani (gn)

French (fr)

Haitian (ht)

Hawaiian (haw)

Nahuatl (nah)

Portuguese (pt)

Quechua (qu)

Samoan (sm)

Spanish (es)

Maori (mi)

Check out our Getting Started guide to learn how to use NeuralSpace’s Language Detection service.

The NeuralSpace Platform is live, test and try it out by yourself! Early sign-ups get $500 worth of credits — what are you waiting for?

Join the NeuralSpace Slack Community to connect with us, ask questions and collaborate on exciting projects with other community members. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.

Check out our Documentation to read more about the NeuralSpace Platform and its different Apps.

Happy NLP!

--

--