(Kevin) Sun Language Theory, or How Turkic Languages Tie Everything Together (Part 1)

Kevin Sun
Sun Language Theories
7 min readFeb 10, 2019


Geographic distribution of Turkic languages (just happens to be in German because it was the best one I could find)

(I was originally going to do one big post on this topic, but it looks like that would be far too long. So here’s Part 1 of 3 or 4.)

Here are two sentences I’ve become very good at saying in almost every language I know:

“I did Russian in college, and after that everything else was easy.”

“Russian was the only foreign language I ever studied formally in a classroom.”

The context, as you might imagine, is me having to explain how I came to speak/know/learn eight/twelve/thirty languages to someone I’ve just met, in another language, which is a scenario I run into pretty often. (Other greatest hits include “I had free time to learn other languages while all my classmates in China were still learning English” and “Living in New York makes it easy to practice most languages.”)

But the thing is, neither of those statements is entirely true.

Sure, grammatically speaking, Russian is likely the most annoying major modern language to learn. But knowing Russian doesn’t exactly make learning a language like Arabic or Hindi/Urdu easy. There’s clearly a bit more to it than that, which ties into the second point:

Russian isn’t really “the only foreign language I ever studied formally in a classroom.” I also somehow took an Uzbek class for a year in college. It was a bit less formal and structured than my Russian classes were, but looking back on it, I really think Uzbek — and Uyghur and Tatar and various other Turkic languages — definitely helped lay the groundwork for me to break into the languages of the Middle East (and broader Islamic world) later on.

In November I wrote about my plan to “focus” on “Asian” languages for the next year or so, in the run-up to this year’s Polyglot Conference in Fukuoka. The languages involved cover a pretty broad range, but the task was made a bit more manageable by splitting them up into five regional groupings.

After a few months sticking with this plan and shuffling things around a little, this is what my language studies look like now:

The main differences since two months ago are: 1) moving Persian from the Indian to the Central Asian group; 2) more emphasis on Arabic and Hebrew; 3) less emphasis on Georgian; 4) maybe thinking about doing some Korean soon; 5) Indonesian is a bit left out and might unfortunately be deemphasized (or I might try to tie it with some Chinese dialects like Hokkien🤔).

You’ll also notice that Turkic languages have been moved to a central position here, which reflects the role they’ve had in my studies recently. On the one hand, I’ve been learning Turkic languages with the help of lots of Russian and Chinese books, and on the other hand, Turkic languages (thanks to a long-standing symbiosis with Persian culture) have been helping me a lot with Arabic and Indian languages and vice-versa.

And within the family itself, Turkic languages in general are so surprisingly close to each other (with notable exceptions I’ll touch on later) that it hasn’t been too hard to rotate from Uyghur to Kazakh to Kyrgyz to Turkish and back without losing a step.

Now let’s take a look at some of the main sub-groupings of Turkic languages, how I learned them, and how you can too. This will be an ongoing series.

(By the way, in case you missed it, the pun in the title is from Sun Language Theory — “a Turkish nationalist pseudoscientific linguistic hypothesis developed in Turkey in the 1930s that proposed that all human languages are descendants of one proto-Turkic primal language.”)

1) On the Silk Road — Uyghur and Uzbek

Above: Samarkand, Uzbekistan. Below: Kashgar, Xinjiang, China

(While going through my last draft, I just remembered that a lot of people might not know what Uyghur and Uzbek are. So here: Uyghur is spoken primarily in the western Chinese province of Xinjiang, and Uzbek is primarily spoken in the former Soviet republic of Uzbekistan. Both countries are home to many historic cities that were part of the Silk Road trading routes.)

The first Turkic language I ever wanted to study was Uyghur, which I guess makes sense since I grew up in China.

There used to be a Uyghur restaurant next to our building in Shanghai, and I would make us go there whenever we had guests. One of the middle schools I went to had a large group of Uyghur students, though they were in a separate class and ate in a separate halal cafeteria. And more generally, I feel like Chinese culture has had a romanticized view of the “Western Regions”, the desert, and the Silk Road for a long time, sometimes benign and sometimes more colonial.

But wanting to learn the language didn’t necessary mean I had the means to learn it. I ended up getting started with other Middle Eastern languages like Turkish and Arabic long before I got my hands on my first Uyghur textbook (available for free and with audio from the University of Kansas) in college. That was the summer before I started taking my Uzbek class, and picking up the basics of Uyghur beforehand really helped me jump right in.

Uyghur and Uzbek, the two main members of the “Karluk” or Southeast Common Turkic language sub-branch, are closely related to each other. Phonetically, the biggest differences are that Uzbek (or at least the written standard) doesn’t have vowel harmony anymore (and no umlauted sounds like ö and ü), while Uyghur still has vowel harmony but also has vowel reduction with a lot of endings (e.g. for the plural marker followed by the possesive, both “-lar” + “-i” and “-ler” + “-i” become “-liri”).

Besides that, there are also some differences in vocabulary (though Uyghur interestingly has lots of Russian loan words too) and grammar, but for the most part you can tell that the two languages could have been one language four or five hundred years ago. A common ancestor, “Chagatai”, was the language of the early Mughals, Muslim rulers of India who originally came from Central Asia. And yes, I have tried to learn Chagatai too — I’ll leave more on that for a later post.

The strong similarities between Uyghur and Uzbek made it pretty easy to make the switch (I joined the class in week 5 and had no problem catching up), and I enjoyed the class a lot. My two years of Russian and self-taught Arabic (still pretty basic at the time) also came in handy. Because Uzbek is probably one of the most heavily Persianized Turkic languages (along with Azeri, maybe), I also picked up a large amount of Persian vocabulary in Uzbek class.

Later on, even though I almost completely stopped learning Uzbek for years, the Persian and Arabic vocabulary I picked up while learning Uzbek became super helpful not only for learning more Persian and Arabic (obviously), but also for learning Hindi and Urdu, and to a lesser extent Swahili, Indonesian and Hebrew.

To me, Uyghur and Uzbek seem like two of the most conservative modern Turkic languages, in the sense that they’re closest to what Turkic languages were like in the Middle Ages or earlier. They’ve preserved sounds that have been lost in Turkish like “q” and “ng”, avoided the shift from “t” to “d” that affected the western Turkic languages (Turkish/Azeri dağ vs. Uyghur tagh, meaning “mountain”) and also avoided a lot of the wild phonetic shifts that more northern Turkic languages like Kazakh, Kyrgyz and Tatar have gone through (to use “mountain” again, Uyghur tagh corresponds to Kazakh/Tatar tau and Kyrgyz too).

I haven’t been studying Uzbek recently (not since I wrote a story about Uzbeks in Afghanistan for journalism school) but I listen to it occasionally, and Uyghur is currently near the top of the list of languages I’m actively working on.

I recently finished a Chinese-language Uyghur textbook from Minzu University, and a Uyghur Reader with English commentary. Right now, I’ve just started on a cool book with a lot of Uyghur written material from modern-day Xinjiang — “Uyghur Texts in Context: Life in Shinjang Documented from Public Spaces” by Frederick de Jong. I’ve also been reading/listening to Radio Free Asia’s Uyghur news on a regular basis and can understand a good amount.

Uyghur has been a really interesting language to learn, with its particular mix of Turkic, Iranian, Arabic, Russian and Chinese influences — to the point that it sometimes seems like all my prior language learning (except Western European languages, I guess) has been leading me to the study of Uyghur. Xinjiang might not be the easiest place for foreigners to visit these days, but hopefully I’ll get to explore the region some day under better circumstances.

Instead of writing up a monster blog post all at once, I’ve decided to pace myself and break this up into a few parts. Next up will be Turkey-Turkish and Azeri followed by the northwestern Turkic languages (Kazakh, Kyrgyz, Tatar and Bashkir). After that, I’ll take a look at some of the ancient/medieval Turkic languages, and my latest new discovery — the amazing Siberian Turkic languages, Tuvan, Sakha and others. Till next time!