Sun Language Theory, Part 2: The Steppes of Tartary (Tatar, Bashkir, Kazakh, Kyrgyz)

Kevin Sun
Sun Language Theories
11 min readApr 7, 2019
Extent of the medieval Cuman-Kipchak Federation, whose primary language is considered the ancestor of modern Northwest Turkic languages.

This is Part 2 of a series on the Turkic languages that I’ve studied (i.e., most of the major ones). Part 1 (Uyghur and Uzbek) is here.

So, it looks like I’ve once again allowed way too much time (over a month!) to pass between my posts on here. Let me make up for that with a major, Turkic-language-related announcement — I’m going to Istanbul!

I’ll also be going to Bratislava as part of the same trip, for the third and final edition of the Polyglot Gathering that’ll be held there, which I strongly recommend if you’re a language person (I mean, you’re reading this blog) and if you expect to be in or around Europe at the end of May.

Since I’m not usually located in or around Europe myself, I figured the only way I’d make the trip to the Gathering again this year was if I took some extra time to hang around in the area. Berlin and Sarajevo also made my top three, but Istanbul ended up being the obvious choice. I’ll be there before and after the Gathering, and I’ve already got my tickets.

This means that Turkish is at the top of my current language study rotation right now, and I’ll have a lot more to write about the language (for the next part of this series) than I thought I would when I started.

But for now, as per my original plan, Part Two of my Turkic language series if going to be about…

The “Kipchak” (or Northwestern Turkic) Languages

In the intro to Part One, I mentioned how the Turkic languages, thanks to their central geographic location, tie together far-flung parts of Asia and Europe in interesting ways that I’ve been able to structure my language studies around.

A lot of these connections are historical, since the modern range of Turkic languages isn’t quite as spread out as it used to be, back when Turkic-speaking nomads made it all the way to Delhi and Cairo and Kiev and beyond. The Kipchak group in particular used to to have a much wider range, being spoken in parts of Eastern Europe and the Arab world.

On the northern side, speakers of Kipchak were also known as “Cumans” (or Polovtsy to Slavic-speakers). In the Middle Ages, Cumans roamed across Eastern Europe, from Ukraine and Poland down through Hungary and the Balkans. There’s even a Kipchak/Latin/Italian/German phrasebook, the Codex Cumanicus, that was compiled sometime around the 12th or 13th century to help Western Europeans communicate with Cumans for trade and politics.

(Even more remarkably, there used to be a dialect of Kipchak spoken in Poland that was written in Armenian script! For some reason there’s no English Wikipedia page for this “Armeno-Kipchak” language, but here are links to the French and Russian articles.)

On the southern side, in the Middle East, there were of course a whole bunch of different Turkic-speaking groups, including “Oghuz”-speakers who were the predecessors of modern Turks, Azeris and Turkmens. Kipchaks didn’t directly establish any states in the Arab world as far as I know, but in a weirdly backward way Kipchak-speakers did end up as rulers of the Mamluk Sultanate in Cairo for over a hundred years.

The word “Mamluk” literally means “owned” in Arabic, or “slave” or “property,” and that’s what the Mamluks were, essentially— slaves brought in from the edges of the Islamic world and trained as solders, many of whom were Kipchaks from Crimea while others came from the Caucasus and the Balkans. And these slave-soldiers also ended up being the ruling class, somehow. (I feel like I need to read more about this at some point, to really understand the dynamic here.)

From 1250 to 1389, the ruling Bahri dynasty was made up mostly of Kipchaks, and a lot of them had Turkic-sounding names like Aybak (“moon commander”?) and Baibars (“rich panther”?). The dynasty that came after them, though, was mostly Circassians (or Adyghe) from the North Caucasus.

Anyway, this history lesson has gone on long enough. Now let’s take a look at the major modern Kipchak languages, which are all spoken in the former Soviet Union.

Tatar & Bashkir

My trip to Turkey in a few months will actually be my first time visiting a Turkic-speaking country ever. On the other hand, I have already been to two non-independent Turkic-speaking republicsBashkortostan and Tatarstan, subjects of the Russian Federation.

In fact, the United States government sent me to Bashkortostan. Ufa, the capital of Bashkortstan, the eleventh-biggest city in Russia with just over 1 million people, is where I spent a summer learning Russian with the Department of State’s Critical Language Scholarship. (The prevailing theory is that we were all sent to smaller cities like Ufa so that we wouldn’t speak too much English in our free time.)

Bashkortostan (a.k.a. Bashkiria) is oficially the national republic of the Bashkirs, but only about 30 percent of the population actually belongs to the Bashkir ethnic group, while 25 percent are Tatars and 36 percent are ethnic Russians. When the republic was first founded by the Soviets in 1917, there were actually more Tatars than Bashkirs in it, but that’s changed since the 1990s, partially because of migration and partially because of people just switching their ethnic identification.

The republic next door, Tatarstan, is a bit more well known. Its capital, Kazan, is somewhat famous for being where Lenin went to (and was expelled from) university, and it was also the seat of an independent kingdom before the Russian conquest. (Bashkiria during that same time period was more of a collection of nomadic tribes.) Kazan recently celebrated the 1,000th anniversary of its founding, and I managed to make a weekend trip there while I was in Ufa.

So in short, Tatars and Bashkirs each have their own republic, but there’s also a bunch of Tatars in Bashkiria and Russians in both, so the politics gets a bit complicated (also there’s oil money and religion involved). I actually ended up writing my undergraduate linguistics thesis about language and ethnic politics in Bashkortostan.

As for the languages themselves, Tatar and Bashkir are pretty similar. More similar than Spanish and Portuguese, more similar than Czech and Slovak, I’d say. If you learn Tatar, and then learn a few basic sounds changes (“s” becomes “h” at the beginning of a word, and “th” in the middle sometimes) you can basically start speaking mostly correct Bashkir. That’s what I tried to do myself— even though I was in Bashkir country, there are far more Tatar learning resources online, so I learned Tatar first and just tried to figure out Bashkir from there.

Linguistically, one thing that stands out most about both Tatar and Bashkir for me is that their vowels are “backwards”, compared to almost every other Turkic language. Specifically, “e” and “i”, “o” and “u”, and “ö” and “ü” are swapped in most words, so where in Turkish the numbers from 1 to 5 are bir, iki, üç, dört, beş, those numbers in Tatar are ber, ike, öç, dürt, biş. This can occasionally be confusing (e.g. et means “meat” in Turkish and it means “dog”, but the meanings are reversed in Tatar) .

A note on the term “Tatar,” by the way. It started out as a term for a tribal grouping in the Mongolian-Chinese border area in the Middle Ages, but the name migrated westward along with the Mongols during their invasions of Russia and the Middle East. Back in the day, Russians would refer to all Muslims as Tatars, but nowadays the name I mostly used by just two groups — the Volga Tatars in Tatarstan, and the Crimean Tatars who also speak a (heavily Turkish-influenced) Kipchak language.

If you want to learn some Tatar yourself, I’d say this is where to start: Самоучитель татарского языка. Yes, it’s in Russian. But really, if you don’t already know Russian, I’m not sure why you’re trying to learn Tatar first.

Kazakh

“For a Kazakh to speak to a Kazakh in a different language is a mark of dishonor!” (source: a Kazakh-language promotion poster I found through Google Images)

Because I speak Russian (and some Turkish), I’ve been asked if I’m from Kazakhstan more times than I can remember. Kazakhstan is definitely one the more well-known “Stans” nowadays, and not only because of Borat — I feel like I definitely run into far more people from Kazakhstan than any other Central Asian republic here in New York.

Outside of Kazakhstan, there’s also a significant Kazakh population in China, in the northern part of Xinjiang, though there’s been a lot of migration to Kazakhstan from China over the past few decades. And a lot of the ones who have stayed behind (or gone back to visit) have found themselves in trouble in recent years.

I’ve attempted to learn Kazakh on and off a couple times over the years (back in my high school in Shanghai we had a large contingent of Kazakh exchange students), but my only really in-depth attempt came last fall, when I read through a full textbook on the language while I was on vacation in China. (The book was in Russian, unsurprisingly — Казахский язык. Самоучитель by К. Шахатова).

The biggest thing about Kazakh that sets it apart from other Turkic languages is its phonetics. It shares some peculiarities with Kyrgyz, but there are a few Kazakh-specific oddities as well: general Turkic sh becomes s in Kazakh, and general Turkic ch becomes sh. There’s also one feature that all Kipchak languages share: non-Kipchak y becomes a j (as in bonjour) in Tatar, Bashkir, Kazakh and Kyrgyz.

Kazakh vowel pronunciation also seems to be most heavily influenced by Russian out of all the former-Soviet Turkic languages, which isn’t too surprising since Kazakhstan also has the highest proportion of Russian speakers.

Within Kipchak, Kazakh and Kyrgyz are a bit closer to each other than they are to Tatar/Bashkir, and the most notable thing they have in common is something that I’ve heard referred to as “consonant harmony” — where the initial consonant on a suffix changes based on the ending of the word it’s attached to.

Take the plural suffix -lar for example (and let’s ignore vowel harmony for a second): “mountains” is tau-lar, but “friends” is dos-tar and “girls” is qyz-dar, because of the consonants that the preceding words end on.

Meanwhile the suffix ma, which is either a negation or question marker depending on where you put it, can come out as pa or ba depending on similar factors.

Kyrgyz

“September 23 — National Language Day! The Kyrgyz language is centuries old, no one can erase it from history!” (source: Kyrgyz government website)

Out of all the Kipchak languages (and maybe out of all the Turkic languages in general) Kyrgyz might be my favorite in terms of the way it sounds.

Because of the way Kyrgyz 1) has a streamlined vowel system, 2) has super-aggressive vowel harmony, 3) really messes with the pronunciation of loan words, and 4) has a writing system that makes sense, I’ve always found Krygyz to be a particularly interesting language to read and listen to.

Here are some examples of what I’m talking about:

1) Simpler vowels

“Mountain” is dağ in Turkish, tağ in Uyghur, and tau in Tatar and Kazakh. In Kyrgyz, the au gets turned into a long oo, so “mountain” is too.

And unlike Kazakh, Kyrgyz doesn’t have messy vowels like ә or ұ which break the symmetry of the system — just a clean, Turkish-style a/e, i/ı, o/ö, u/ü.

2) Extreme vowel harmony

The plural ending in most Turkic languages only comes in two forms (ignoring consonant harmony now): -lar and -ler. Kyrgyz has -lar, -ler, -lor and -lör! And the same is true for -da/-de/-do/-dö (“in”) etc.

So while in Kazakh “in the mountains” is taularda, it’s toolordo in Kyrgyz. And “in the lakes” in Kyrgyz is köldördö, unlike Kazakh’s kölderde.

3) Messing with loan words

Perso-Arabic loan words in Uzbek and Uyghur are pronounced pretty similarly to the original Persian, reflecting the long history of religious, political and cultural exchanges these languages have had with the Middle East. Kyrgyz on the hand (and Kazakh to some extent) was more peripheral to the Persianate-Islamic world, and that seems to be reflected in the way they’ve butchered loan words. Some examples:

  • my favorite: Persian meywe (“fruit”) becomes mömö in Kyrgyz. Both the vowels and consonants get assimilated here.
  • Arabic waqt (“time”) becomes ubaqıt
  • Arabic aḥwal (“situation(s)”) becomes abal
  • Arabic niẓām (“order, system”) becomes mıyzam (“law”) [sic]
  • Arabic dunyā (“world”) becomes düynö

…and so on.

4) Simpler writing

So far I’ve been writing all Tatar/Kazakh/Kyrgyz words in Latin transcription to make it easier on folks who might not know Cyrillic well, but this last point relates specifically to how Cyrillic is used to write Kyrgyz.

The Kazakh Cyrillic alphabet has two additional letters, қ and ғ, which represent guttural q and ğ sounds. The thing is, these letters are usually redundant because outside of loan words, q only appears before a/ı/o/u and k only appears before e/i/ö/ü. ğ and g have a similar distribution.

Taking advantage of this, Kyrgyz simplifies the alphabet by just using the normal Cyrillic к and г all the time, since the reader can easily figure out the proper pronunciation based on context. I think that makes the spelling a lot cleaner, especially since a lot of fonts still have issues with қ and ғ. (Interestingly, Bashkir uses the Kazakh approach to spelling with extra letters for those sounds, while Tatar uses a Kyrgyz-like contextual style.)

That’s not to say that I have an issue with extra, Turkic-specific Cyrillic letters in general! I actually think the letter ө (for ö) looks great, for example.

All in all, these different features of Kyrgyz all come together to give it a bit of an amusing (but endearing!) look and feel in my mind, and it’s definitely the Kipchak language I’m most interested in studying more of.

Speaking of which, I also only really studied Kyrgyz in depth for the first time last year, soon after I wrapped up my first Kazakh book. I’ve actually read textbooks for Kyrgyz in not only English and Russian, but also Chinese! (There’s also a notable Kyrgyz population in Xinjiang, and yes, they’ve unfortunately being having a bad time recently as well.)

Kyrgyz also seems (a bit surprisingly) easier to find materials for than Kazakh — maybe because Russian isn’t quite as widely known in Kyrgyzstan as it is in Kazakhstan. For example, the BBC World Service offers Kyrgyz (and Uzbek and Azeri and obviously Turkish) but not Kazakh. Radio Free Europe also has more robust programming for Kyrgyz (hours of audio every day) than it does for Kazakh (just a 15-minute segment a day, as far as I can tell).

Now that I’ve covered the Turkic languages of a) the old Silk Road and b) the steppe areas just north of them, in the next part of this series I’ll head west for a look at Anatolian Turkish and its close relative, Azerbaijani. After that, I think I’ll do a writeup for some medieval/ancient Turkic languages, and finally the amazing Northeastern Turkic languages, like Tuvan and Sakha. Stay tuned!

--

--