Learning The World’s Ten Most-Spoken Languages

Kevin Sun
Sun Language Theories
17 min readJul 25, 2017
(Source: Wikimedia Commons. Numbers are percentages of world population.)

Welcome to the first article of my language blog! Or maybe, welcome to some random article on the internet with no blog or follow-up of any kind. Time will tell, I suppose.

For the past few years one of my long-term language-learning goals has been to learn The 10 Most Spoken Languages in the World (by # of native speakers*), and I think I’m finally getting close to (at least) a conversational or intermediate reading level in all ten at this point. Several of these are widely-studied languages, as you might expect, but a few of them aren’t, and I have some thoughts on all of them.

* Language population numbers can vary significantly from source to source, for a variety of reasons. For the purpose of this exercise I’m using numbers from the Wikipedia page “List of languages by number of native speakers,” which cites a Swedish encyclopedia from 2010, and whose top-10 ranking matches the Babbel article linked above.

1. Mandarin (~955 million speakers in 2010)

This is one of my two native* languages, English being the other. I’ve heard that it’s a difficult language to learn, so I guess I’m pretty glad I don’t need to spend much time studying Mandarin these days — otherwise, I doubt that I’d have had time to dabble in as many other languages as I have.

(*Slight caveat: I didn’t actually start speaking Mandarin fluently until I was ten years old, when my family moved back to China from the States. Although I had grown up hearing it a lot, my spoken Mandarin was pretty bad when I first went back, and it took several months for me to catch up with my classmates.)

That said, I feel like I’ve neglected Mandarin a bit too much lately, so I’ve started making an effort to read and listen to it more on a regular basis. There are a few places where I could definitely use improvement:

  1. I’m seriously out of touch with the latest Chinese internet slang and pop culture. Like, the most recent Chinese internet meme I am aware of is eight years old, i.e. from the last time I was back in China.
  2. At the other end of the language history spectrum, I’m also forgetting lots of the classical Chinese stuff I learned in school. (We used to have to memorize tons of classical poetry and prose for exams.) I actually quite enjoyed doing classical Chinese in school, since it was like learning a new language.
  3. My last (and probably most practical) problem is with technical jargon in general, whether it’s politics or economics or linguistics or computer science. When I speak Chinese with people in the U.S. I can generally get away with switching into English when technical terms come up, but it’d be nice to not have to do that as much. (And yet I somehow still remember all the Chinese names for organic chemical compounds which I learned in high school….)
Chinese chemistry terminology involves, uh, Celestial Stems (source: here)

How I’m studying Mandarin right now: I recently stumbled across a promising Chinese podcasting platform which I plan to browse around more in the coming weeks. So far I’ve mostly been using it for various regional dialect podcasts, but I think I’ll give some of the history podcasts a listen soon.

I also found this great online library of classical Chinese texts with translations and commentary, which I’m going to dig into soon.

2. Spanish (405 million)

Ever since high school, I’ve had people asking me what foreign language they should learn next (after English, usually). Although I really think the answer should just be “learn whatever you want,” the uncontroversial, utilitarian, and nearly indisputable answer I tend to give is “learn Spanish!”

In terms of both the number of people you could potentially speak it with (I mean, it’s number two on this list) and the amount of effort needed to learn to speak it (I don’t want to say any language is “easy” but the other languages on this list — except possibly Portuguese? — are all harder), Spanish is the clear winner in the return-on-investment department.

And Spanish isn’t just relatively easy to learn because of grammar and vocabulary, either. Spanish is everywhere. A Spanish-language song is at the top of the Billboard charts in the U.S. right now. The two most popular sporting personalities in the world ply their trade in Spain. Our president wants to build a wall because of Spanish (I mean, basically, right?).

Honestly, if you’re taking the time to read this list/article about languages, you probably already know you should learn Spanish. There isn’t much more for me to add here.

Spanish. (source: Know Your Meme)

How I’m studying Spanish right now:

I went through two periods of focused Spanish-studying recently, with grammar books, usage books, vocabulary books… the whole shebang. The first was when I travelled to the Dominican Republic last year (it was my first time in Latin America, and also Caribbean Spanish accents are the best), and the second was when I decided to go nuts and apply for a journalism job at Univision this spring (I did not get a job).

I’m not really focusing on Spanish at the moment, and yet it’s still one of the languages I get the most practice in. Whenever I go to a language exchange event in the city, I might not get to practice all the languages I want, but I’m guaranteed to have a few conversations in Spanish. When I want to watch a soccer game on TV, half the time it’ll only be available on a Spanish broadcast. It’s kind of unavoidable, and I don’t mind it one bit.

3. English (360 million)

If you’re reading this, you clearly know English already. Congratulations! It’s a good language to know!

I’ll keep this section short, but I just wanted to recommend that you check out this interesting article about what a shit-show the English language is. And I don’t mean “shit-show” in a bad way at all — some of my favorite languages are complete historical disasters like English, haphazard mashups of many different historical influences, remolded into a new, unexpectedly functional whole. In fact, the next language on this list is also an amazing mess…

4. Hindi* (310 million)

*plus Urdu (another 66 million) which would actually put combined Hindi/Urdu at #3 above English 🤔

I only started seriously studying Hindi (and Urdu, which is practically the same language in day-to-day speech) after I finished college. In retrospect, I have no idea why it took so long for me to get started — I had been interested in the language for a while, but I guess I just never found the time and the right resources to study it with.

Anyway, the thing that finally pushed me to learn Hindi/Urdu was getting my first job out of college as a software developer. About half of my coworkers speak Hindi, and during the new hire training program people started asking me: “Kevin, how can you speak so many languages, but not Hindi? It’s one of the biggest languages in the world!” Well, they had a point.

Fortunately, during all that time I had spent not learning Hindi, I had studied a decent amount of Arabic and Persian, and I’d also taken a whole year of Uzbek in college. Uzbek is full of Arabic and Persian words, for historical reasons, and Hindi (and Urdu even more so) is also full of Arabic and Persian words, for similar and related reasons. This made picking up vocabulary significantly easier — though I don’t want to downplay the amount of native Indian vocabulary in the language, as well as the significant Sanskrit component.

Speaking of foreign vocabulary in Hindi, the fact that most Hindi speakers habitually code-switch with English in conversation made it easier to start talking to people and getting conversation practice — whenever I didn’t now how to say something, I could just say it in English.

Another thing that made learning Hindi much easier (and more fun) was the ready availability of lots and lots of media to consume in the language. Once I got to the point where I could parse the dialogue with the help of subtitles, I must have watched at least a dozen Bollywood movies in a few months. I would pause the video every five or ten minutes to look up new words.

The Ghalib website also comes with these prints of each poem, which are great.

How I’m studying Hindi/Urdu right now: I recently found some great classical Urdu poetry collections on the internet — for the poets Ghalib and Mir Taqi Mir, respectively — so I’ve been browsing through those a bit. Urdu poetry is something I’ve tried to get into in the past, but most other resources I found didn’t have as much in-depth commentary, explanations and references as these, which are curated by a professor at Columbia.

On a more practical note, I’m also halfway through G.C. Narang’s Readings in Literary Urdu Prose (free download here) which has got some interesting historical texts in it. The book is also quite helpfully printed entirely in Nastaʿlīq script, which I find a bit harder to read than the more “square” Arabic styles. I could use the practice, especially since so much Urdu stuff (and Persian, for that matter) is still written in this calligraphic style.

5. Arabic (295 million)

In seventh grade I tried learning Quenya, one of the elf languages from the Lord of the Rings. I used to scribble all sorts of Tengwar (elf writing) stuff in my notebooks at school. Sometimes my classmates or teachers would see this and ask: “Is that Arabic?” To which my response would always be: “What the fuck? This looks nothing like Arabic!” And I knew this for a fact because, well, I had also started learning Arabic at about the same the time.

Not Arabic. Come on. (source: Wikimedia Commons)

I’ve long since sworn off artificial languages like Quenya. Who needs fake fantasy languages when real human languages like Arabic are so much richer and more complex?

My approach to Arabic has been extremely disjointed over the years, as I’ve jumped from Modern Standard Arabic to dialect to other dialect to standard and so on many times over, but I feel like I’ve still managed to build up a decent grasp of the language after ten-plus years of somewhat inefficient study.

So far, my Arabic studies have actually paid off a lot more in terms of making it easier to pick up vocabulary in other languages. Knowing Arabic helped quite a bit with Persian and Uzbek, which as I mentioned before made Hindi and Urdu a lot easier. Several times I’ve even learned Arabic words via Persian or Urdu rather than the other way around.

Other languages I’ve looked at, as far apart as Turkish and Swahili and Indonesian, all have a good amount of Arabic vocabulary in them too, and knowing the grammar of a Semitic language also helped me pick up Hebrew (and a little Amharic) more easily.

How I’m studying Arabic right now: I had been letting my Arabic skills waste away for quite some time (while focusing on other Middle Eastern languages), but then Saudi Arabia and friends decided to blockade Qatar last month. I decided to started listening to Al Jazeera’s Arabic live stream to a) see how much Arabic I could still understand, and b) keep up with the situation, sort of. I was surprised to find that I could still understand about 50 percent of what was being said.

I’ve also decided to take a serious look at the Egyptian dialect (I’d previously spent much more time on Levantine dialect, and a little on Maghrebi), so we’ll see how that goes in a few months.

6. Portuguese (215 million)

If you know Spanish already, learning Portuguese is easy. That’s what they say, and it’s totally true. One day I just decided to learn Portuguese for no good reason, and two months, some Brazilian films and a DuoLingo course later, I was having conversations with people in it.

I even ended up going through a phase where I was more comfortable speaking Portuguese than Spanish — because Spanish was too “mainstream” and therefore less cool or something, I guess. These days I’d say my Spanish has a slight upper hand due to more frequent exposure, if nothing else, but it’s very close.

Macau is the red dot under Canton, Goa is the second red dot under Surat (source: here)

I haven’t gotten to visit Brazil yet, but I’ve been to two other former Portuguese colonies — Macau and Goa — where, uhh, I didn’t manage to find any actual Portuguese speakers (although I got to see lots of old Portuguese inscriptions). Macau and Goa are part of a whole string of Portuguese outposts in Asia and Africa that left behind some pretty amazing linguistic artifacts:

  • the Indonesian words for “shoe”, “flag”, “party”, and “window” — sepatu, bendera, pesta and jendela — are from Portuguese sapato, bandeira, festa and janela
  • the Hindi/Urdu words for “room”, “key”, and “bucket” — kamra, chabi, and balti — are from Portuguese câmara, chave, and balde
  • the Swahili word for “prison” — gereza — comes from Portuguese igreja which means “church”! And finally… the Swahili word for Portugal itself is Ureno, which comes from Portuguese “o reino” i.e. “the kingdom”!!!

Portuguese also played a big role in the early development of Atlantic creole languages, from Cape Verdean creole and Papiamentu all the way to English creoles like Nigerian Pidgin, where sabi means “to know” (from saber) and pikin means “child” (from pequeno). (Yes I have in fact attempted to study all of these. I know.)

How I’m studying Portuguese right now: Portuguese is on my “maintain” list right now — I’m not actively trying to improve my Portuguese but I consume media in it pretty regularly (mainly music and news radio) just so I don’t lose it. Portuguese is also a good language for following soccer news in, since both Brazil and Portugal are central to the transfer market.

7. Bengali (205 million)

After learning Hindi, I tried dabbling a bit in some other Indo-Aryan languages like Punjabi, Gujarati and Marathi. But what I really wanted to learn next was Bengali, and I spent years looking for a good textbook. I eventually did find a collection of good materials online and spent several months on it last spring/summer.

But wait, why did I want to learn Bengali so much?

Bengali on top, Devanagari below. Doesn’t the Bengali look so much cooler?

Well, first of all, the Bengali script looks so much more badass than other Indian scripts. That’s a bit of a subjective call, obviously, but the sweeping curves and sharp corners of Bengali script make it look a lot more intense than Devanagari (the main script used for Hindi), for example. Devanagari just seems so stiff by comparison.

Another interesting thing about Bengali is that its grammar is much simpler than Hindi and most other Indian languages — for example there’s no grammatical gender, no split ergativity, and plural markers are frequently optional. I have a hunch that this has something to do with Bengali’s geographic location at the frontier between South and Southeast Asia, though I’d need to learn a bit more about Southeast Asian languages to flesh out that idea some more.

I also think learning Bengali might be the closest I’ll ever get to actually learning Sanskrit, or at least Sanskrit vocabulary (Sanskrit grammar is a bit much). Even though some styles of Hindi do use a lot of Sanskrit words, most people I’ve spoken to have told me “no one actually speaks like that.”

Bengali on the other hand seems to have incorporated Sanskrit vocabulary in a more organic way — one sign of this is the way that most Sanskrit words in Bengali are phonetically eroded as if they’ve been around for a long time, though the original spelling is preserved.

How I’m studying Bengali right now: I’m still working my way slowly through the materials on the University of Chicago site I linked above. I’m about halfway through Clinton B. Seely’s Intermediate Bangla, and after that,I’m planning to continue with the Bengali Prose Reader for Second-Year Students. That might take a while.

8. Russian (155 million)

I signed up for Russian class in my freshman year of college, shortly after Russia had invaded Georgia in 2008. I figured Russian was on its way back to being a language worth knowing again, after decades of decline in international prestige.

Well, that certainly turned out to be the case.

I studied Russian for two years in college — one hour a day, five days a week — and honestly I don’t think I could have learned such a grammatically complex language without the kind of enforced repetition and focus that comes with actually taking a class. (The only other foreign language I’ve formally taken a class for is Uzbek.)

Afterwards I also spent a summer in Russia (Bashkortostan, specifically) and that immersion experience made a big difference — everything just clicked while I was living there, and I went from “low intermediate” to “high intermediate” in two months.

My experience learning Russian really shaped my entire language-learning process from then onwards. First of all, the complexity of Russian grammar has made all other languages that I’ve tried to learn seem simple by comparison.

Secondly, Russian was the first foreign language I ever got really fluent in (I hadn’t had as much success in the languages I’d tried to learn before that), so I was a great confidence boost and also provided a road map for me to follow when trying to reach fluency in other languages.

Finally, Russia is such a big, sprawling country that a huge range of other languages all seemed like logical next steps after Russian. I tend to skip from language to language based on geographical proximity, so on the one hand I could be going from Russian to Polish to Hungarian to Serbian to Albanian and Turkish, or Russian to Swedish to German to Dutch to French to Spanish and Portuguese, while on the other hand I would be going from Russian to Uzbek to Persian to Hindi to Bengali to Indonesian… and so on.

My language-learning process, basically. (source: Strange Maps)

How I’m studying Russian right now: I listen to nefarious Russian state-run propaganda outlet Sputnik on a regular basis, just to see what’s going on in that media ecosystem (I think I enjoy Sputnik’s Serbian broadcast more than the Russian one though). I’ve also been reading some of Peter Kropotkin’s anarcho-communist political theory in the original Russian* so uh… I guess you can say my Russian consumption has been all over the place lately.

(*It’s to learn more about Russian history, okay? I’m not trying to abolish the state lol. Hi NSA.)

9. Japanese (125 million)

Japanese was one of the most popular foreign languages among my peers in China growing up, and there was plenty of of Chinese-language material around with which to learn it. It was also the third foreign language I tried to teach myself, after French and Spanish.

The fact that Japanese makes heavy use of Chinese characters certainly made things a bit easier. The relationship between Chinese and Japanese eerily mirrors the relationship between Persian and Hindi, in terms of the huge amount of loan words and the way in which they are integrated into the borrowing language.

The general sentence structure of Japanese is also oddly similar to Hindi: the verb goes at the end of the sentence, prepositions go after the word they modify, and foreign verbs are borrowed using a verbal noun plus the verb “to do”.

Familiar vocabulary aside, one thing about Japanese that continues to trip me up constantly is the way that social relations and politeness are built into the structure of the language itself — so you can easily say something that’s logically and grammatically correct, but completely inappropriate. This was a big issue for me the last time I visited Japan — at one point I just decided to make it extremely obvious that I was a foreigner by simply saying stuff like “Buy tickets? Where?”

Pictured: Me attempting to speak Japanese.

My trip to Japan was such a disaster language-wise that I spent several months focusing on Japanese after I returned from my trip. I had somehow hardly watched any anime before, but I watched a whole bunch in the space of three or four months, pausing to look up words in the same I had done when studying Hindi.

I eventually did manage to have a number of decent Japanese conversations , but I’m pretty sure I still messed up my politeness frequently — it’s just that people (and I myself) cared less about it because we were not in the social context of being in Japan.

How I’m studying Japanese right now: Well, um, my laptop’s system language is currently set to Japanese, so I guess that counts for something? I also recently came across a nice website run by Japan’s national broadcaster NHK called NEWS WEB EASY which has NEWS articles on the WEB in EASY Japanese, complete with audio and explanations for difficult words (and links to the full articles at standard difficulty), which I’ve started reading regularly.

10. Punjabi (100 million)

The first time I dabbled in Punjabi, shortly after learning Hindi, I got bored and dropped it pretty quickly. It basically seemed like a minor variation on Hindi and not worth putting the time in to learn separately.

When I set the goal of learning the most spoken languages in the world, I realized I’d eventually have to give Punjabi another shot, and I wasn’t that hopeful that it would be more fun the second time around. But it actually was! All it took was a shift in perspective.

Punjabi dialect map (source: Wikimedia Commons)

First of all, I should point out that the “100 million” figure cited above for Punjabi includes not just standard (a.k.a. “Majhi” or central) Punjabi, but also a broad spectrum of dialects, some of which are occasionally considered separate languages, like Siraiki (a.k.a. “Multani”) in the southwest and Hindko in the north.

Dialect continuums like this are one of my favorite language things — it’s why I had a great time doing South Slavic, for example (mostly Serbo-Croatian, plus some Macedonian, less Bulgarian, and just a tiny bit of Slovenian — fun times all around).

Some of the Western Punjabi dialects in particular have unique grammatical features (an elaborate pronominal suffix system, different conjugations for transitive and intransitive verbs, a special way of forming the future tense, internal vowel changes for some noun declensions etc.) that show influence from neighboring languages like Sindhi and Kashmiri, which is pretty cool.

So once I realized that Punjabi wasn’t just some peripheral language to Hindi, but an independent dialect cluster of its own (albeit with a lot of overlap and mutual influence with Hindi, and lots of the same Persian and Sanskrit influences, of course), studying it became a lot more enjoyable.

How I’m studying Punjabi right now: I’m currently working my way through Colloquial Panjabi 2, which is a free-to-download addendum to the main Colloquial Panjabi book from Routledge, which I do not actually own.

Another great thing about learning Punjabi is that it has a lot of good music in both modern and traditional styles, so I’ve been using that heavily for listening practice.

As mentioned above, I’ve also been reading up on some of the other dialects of Punjabi like Multani/Siraiki and Hindko. I’m not trying to become conversational in these dialects or anything, but finding out how they work helps situate standard Punjabi in its wider context. Also, sometimes the only English books I can find on these dialects are super old and colonial and contain passages like this:

(source: Glossary of the Multani Language (1903), page viii)

I feel like that’s being a bit harsh on the languages of eastern India (such as Bengali), but the “wild flowers in a hedge” part is really on point.

Well, that’s it for now — and I didn’t even get to talk about German, Javanese, Malay, and Shanghainese yet! (Those would be the next four on the most-spoken list.)

If I keep this blogging thing going, most of my future posts will probably be a bit shorter than this. I have a few ideas lined up for the next three or four articles, at least.

Anyway, hopefully this was a decent read and gave you a sense of my language-learning background and the general scope of my language interests. Till next time!

--

--