Reetam B
6 min readAug 14, 2023

--

Indian Languages and their Ancestors— It’s not just Sanskrit (part 1)

Languages can share their ancestors, making them related to each other. The easiest example would be Spanish, French, Italian, and Portuguese. All of them descended from Latin, and as a result share notable similarities in their grammar, but especially their vocabulary. This idea, though, can actually be applied way beyond just Western Europe. Take northern India, where languages like Hindi and Bengali are related to each other in almost the same way as are French and Spanish — in this case, they both come from Sanskrit.

The analogies between the situation with Latin and with Sanskrit are surprisingly compelling: both were scientific languages that were specifically maintained by an upper class as a literary language, while the spoken form that was used by most people changed over time. Classical Latin became Vulgar Latin, and Sanskrit became Prakrit. From there, different regions took these spoken languages and changed them further, keeping the original upper-class language as one of religious and scientific prestige.

The real exciting part here is that these situations — of older languages evolving into several newer ones — isn’t unique to Latin and Sanskrit. In fact, it’s quite the opposite. The majority of languages in the world are descended from an older language, an ancestor which they share with at least several other languages. When two (or often more) languages share a common ancestor (i.e. they used to be the same language thousands of years ago), those languages are said to be “related to each other”, in a “genealogical relationship”, and part of the same “language family”.

These terms can be helpful in explaining how language families work. Bengali, Assamese, and Odia are closely related languages — we could call them ‘siblings’, whose parent is Magadhi Prakrit. Similarly, Hindi and Urdu would be siblings, and Nepali and Gujarati would be their cousins. Those four have a grandparent in Shauraseni Prakrit, which is in turn a sibling to the aforementioned Magadhi Prakrit. Finally, the two Prakrits’ grandparent is Vedic Sanskrit. This graph, by @india.in.pixels on Instagram and Facebook, captures some of the details.

Tree diagram of Indo-Aryan languages by @india.in.pixels

What’s even more interesting is that this “Indo-Aryan” language family, with Sanskrit at its head, is not an isolated family. In fact, Sanskrit and Avestan, the old liturgical language of Persian Zoroastrianism, are cousins in much the same way Hindi and Bengali are now — they’re even more similar to each other than Sanskrit is to many modern Indian languages. And, of course, modern Persian, along with other Iranian languages, are descended from Old Iranian, which was the Prakrit equivalent to Avestan’s Sanskrit, as Avestan was attested primarily in the eponymous Avesta (the Zoroastrian religious text).

The name for the family of Persian and North Indian languages is Indo-Iranian, and they share marked similarities, and one of the easiest places to see this is in the numbers (https://sites.la.utexas.edu/persian_online_resources/vocabulary-lists/numbers-1-100/). If you’re familiar with Hindi, that’ll make this easier, but the similarities are summarized in the below table.

Table comparing Hindi and Persian numbers

The numbers have quite a lot of points of comparison — it’s actually quite remarkable how similar they have stayed. These words are similar because they once were the same word a long time ago, in a language we call Proto-Indo-Iranian (the common ancestor between Sanskrit and Avestan, and by extension between the North Indian languages and Persian languages). Let’s take the word for “five”, and we can find that in Proto-Indo-Iranian, that word used to be pancha (https://en.wiktionary.org/wiki/Reconstruction:Proto-Indo-Iranian/p%C3%A1n%C4%8Da). Over time, in both the Iranian and Indian branches, it changed to eventually become panj in one and panch in the other.

In linguistics, we call words like this “cognates” — words that are similar, both in sound and meaning, in two languages because in the distant past, they were once the same word. One of the main things to note about cognates is the reason they aren’t identical. When a language evolves over time, some of its sounds change — think of how Sanskrit’s letters स (sa) and ष (ṣa) both became pronounced as শ (sô) in Bengali.

Dozens to hundreds of these sound changes occur depending on how old the common ancestor was, which explains why the words aren’t exactly the same when we look at, say, Persian and Hindi (this is why panj ends in -j in Persian, or why ek has no y- in Hindi). A good, specific sound change to use for an example is that Sanskrit’s s/ś/ṣ consonants became h in Persian, under certain conditions. Hence, in Persian, ahura is cognate to Sanskrit aṣura, and dah to das, meaning they were once the same word, a long time ago (Ironically, this s→h sound change is akin to one happening in Eastern Bengali dialects).

The real mindblower here, though, is that Indo-Iranian is far from the oldest known ancestor that the north Indian languages have and share with other languages. There’s an entire, huge family of languages spread across Europe and Asia called the Indo-European languages, and each follows a genealogical methodology and structure much like the Indo-Aryan and Iranian groups I’ve already discussed. I made this chart below to summarize the complexity and vastness of the family, putting particular emphasis on the Indian side.

Original tree diagram of the Indo-European language family. Made in Obsidian.

One thing of note in this family tree is the distinction between “centum” and “satem” languages. These words are the Avestan and Latin words for “one hundred”, representing a sound change occurring from Proto-Indo-European that changed some k sounds into s in satem languages, and leaving it as a k sound in centum languages (though ironically, Latin’s descendants change that k back into an s sound, so French’s cent, from Latin centum, starts with an s sound).

We can actually see some cognates if we use a larger version of the number comparison chart from earlier, focusing on just the words for “death”, “two”, and “you”.

Table comparing the words for ‘death’, ‘two’, and ‘you’ in various Indo-European languages

Out of this chart, there’s a lot of similarities you can catch just by looking: most of the words for ‘you’ (except, ironically, ‘you’ itself) begin with a /t/ or /d/ sound, two consonants which sound quite alike. They also usually use the vowel /o/ or /u/, again pretty similar sounds.

The words for ‘two’ are in much of the same situation, down to the same typical consonant and vowel. German’s kind of an exception here, because its /d/ softened into a /z/, but the word is still related.

Finally, in the case of the word for death, they feature an initial /m/ (except in Russian, where an old grammatical quirk evolved into an initial /s/); they also feature a medial /r/ and usually a final /t/ or /d/ (like we saw before, /t/ and /d/ are very close). We have two exceptions here, and both are the Germanic languages, German and English — which use, again, a /t/-/d/ consonant both beginning and ending the word. Luckily, English and German have other words, ‘murder’ and ‘mord’, that are related to death and still follow the /mrt/ pattern.

It’s these lexical similarities (plus grammatical ones, which I won’t get into yet), with cognates that share close sounds and close meanings, that allowed linguists to recognize that languages had shared common ancestors in the first place. Using this methodology, they’ve identified a number of language families in the world, plus many languages which don’t seem to have families at all.

So join me next time, where I’ll be taking a dive into the various language families in India (to my South Indian readers, I’ll actually talk about Dravidian languages this time), and where their relatives can be found elsewhere in the world, alongside a discussion of false positives. See you there!

--

--