A Few Things I’ve Learned About Learning New Writing Systems

Kevin Sun
Sun Language Theories
11 min readSep 20, 2017
A map of the world’s writing systems (source: Wikimedia Commons)

It seems like everyone has opinions about language (and languages), even though not that many people are linguists or language enthusiasts. I guess this is normal, since people probably should have opinions about something they use every day of their lives to perform most basic tasks. On the other hand, this results in a lot of strange, funny, or downright wrong statements about languages.

But some of these strange statements can be pretty thought-provoking.

A few months ago I was grabbing coffee with a new colleague at work, and my language-learning thing inevitably came up when we started talking about hobbies. I mentioned that I had studied Russian in college, which produced the usual response: “Wow! Russian is really hard to learn!” as well the common non-language-person follow-up: “…because the alphabet is so different!”

Mind. Blown.

As usual, I tried to explain: “Well, that’s far from the hardest aspect of learning Russian.” As I was considering how best to introduce the concept of grammatical cases, my coworker had a sudden realization— “Oh yeah! Of course! You already know Chinese characters, which are way more complicated! I bet Russian is a piece of cake compared to that!”

🤔🤔🤔🤔🤔🤔

At the time, I nodded in agreement (“Yeah, I guess that might have helped! I never thought of that!”) while internally shaking my head. What could knowing Chinese characters possibly have to do with learning Cyrillic? Why would knowing an ideographic writing system make learning an alphabetic one any simpler?

But the observation stuck with me. What if it isn’t completely ridiculous? What if having to learn thousands of characters as a kid did somehow make me better at processing visual features in other writing systems? I don’t know. It’s not as if most people in China know how to read a dozen different scripts or anything… but, well, I actually do, at least.

Up until last spring, it had been a long time since I’d last tried to learn a new writing system. I had learned several as a kid — Chinese simplified and Latin script, obviously, but also Cyrillic, Greek (half of it in physics and math class, I think), Chinese traditional, Hiragana/Katakana, Hangul, Tengwar (lol), Arabic, Hebrew, Devanagari, and Tibetan. Learning those — especially the super-widespread Latin, Cyrillic, and Arabic scripts — gave me access to most of the languages I was interested in learning early on, so it makes sense that I didn’t really need to learn to read another script for almost eight years.

But eventually, it was time for me to tackle some more writing systems, especially (but not only) once I started shifting my attention to the messier almost-one-writing-system-per-language zone of South and Southeast Asia. In the past two years, I’ve learned to read (in order of decreasing proficiency) Bengali, Gurmukhi, Tamil, Thai, Telugu, Armenian, Ethiopic and Syriac, and it’s gotten me thinking a lot more about how different writing systems work and the best ways to learn them.

So, here are some general rules of thumb I’ve come up with as a result.

1a. Identify and focus on easily-confused letters

Groups of similarly-shaped letters that have caused headaches for me recently.

A good thing to do when you first encounter a new alphabet is to look at all the letters and group them by common shapes, so you’ll know which ones are most likely to confuse you later on.

This exercise is also helpful since it’ll require you to figure out what graphical features are meaningful (i.e. they distinguish one letter from another) and which are just stylistic. For example, if you look at the picture above, you’ll see that the little extra stroke on top of Ethiopic ሰ/sä is all that’s separating it from በ/bä, and that in Telugu it really matters whether or not the checkmark at the top is touching the tail at the bottom (వ=v) or not (ప=p). In Thai, whether the little circle curls to the left (ด=d) or the right (ค=k) makes a big difference. And in Armenian, a little horizontal extension in the middle (գ=g) is completely different from an extension at the bottom (զ=z). This brings me to a related but slightly different point:

1b. Be aware of possible variations of the *same* letter (e.g. fonts)

All the same thing. 🤔

If you’re reading this, you probably realize that all of the symbols in the picture above are just varieties of the same letter. Your brain can pick out the relevant features and ignore the stylistic quirks, as far as reading comprehension is concerned. If you’re going to learn a new writing system, you’ll want to learn to do this trick with a new set of letters as well.

Sometimes, I guess the line between a “stylistic variant” and a “completely different script” isn’t actually that clear-cut: is cursive Latin “the same writing system” as printed Latin? Cursive Cyrillic has a number of major differences from printed Cyrillic, and even though I can easily identify letters in printed Hebrew, I can hardly understand cursive Hebrew at all.

Similarly, the Nastaliq style of Arabic (especially common in Persian and Urdu) is different enough from usual printed styles that it took me a few extra months to learn how to read it. On the other hand, I kind of think that Kannada and Telugu might as well be considered two stylistic varieties of the same script, for example.

Anyway, I guess what I’m trying to say is that depending on what writing system you’re learning, you should be aware that font- and style-related issues might cause you some trouble. And unless you happen to be learning Thai, at least you can be glad that you don’t have to deal with Thai fonts:

The same text in several different fonts of Thai. 😱 (Source: thai-language.com)

2. Relate the new script to things you already know

See? You already know plenty of Greek and Cyrillic.

With the exception of East Asian scripts and a few others that were invented in the modern era, basically all current writing systems — from Latin to Mongolian to Ethiopic to Balinese — can possibly all be traced back to Phoenician. (Although the link between Indian scripts and Middle Eastern ones is a bit controversial.) Because of this, there are a lot of opportunities to make use of stuff you already know when learning a new writing system. Here are two examples from my recent experience:

When I first started dabbling in Amharic, one thing that was tricky for me was the fact that there were sometimes multiple letters for the same sounds, such as “h,” “ts,” and the glottal stop. But after a bit of research, I figured out how to map these letters to the Arabic (and Hebrew) scripts, and that made it a lot easier to remember words and identify cognates (since Amharic is a Semitic language like Arabic and Hebrew):

  • for the “h”-like sounds, Amharic ሐ = Arabic ح, Amharic ኀ = Arabic خ, and Amharic ሀ = Arabic هـ
  • for the “ts”-like sounds, Amharic ጸ=Arabic ص and Amharic ፀ=Arabic ض
  • for the glottal stops, Amharic አ = Arabic ء and Amharic ዐ = Arabic ع

I did something similar for Thai, which is a pretty complicated script where consonants are divided into three “classes” (high, mid, low), so three letters can have the same pronunciation but influence the tone of the syllable in different ways. It seems a bit arbitrary at first, until you realize that as an Indian-derived script, there’s a nearly one-to-one mapping from Thai to Sanskrit:

(Source: Wikipedia)

(You might recognize the “voiced” and “aspirated” stuff from my article on Wu Chinese.)

When you keep in mind the voicing and aspiration of the original letters (which was partially lost in Thai pronunciation), the grouping of Thai letters into different classes makes a lot more sense.

In both the Amharic and the Thai case, what mattered wasn’t so much that the shapes of the letters I was learning actually be similar to ones I already knew (although there were occasionally a few similarities), but simply that I had an existing framework to help organize things in my mind.

“Using stuff you already know” doesn’t have to mean knowing a bunch of other alphabets beforehand and doing tons of etymological research, though. Other sorts of mnemonic techniques can be helpful too, like:

Oh.

It’s the same thing, really.

3. Learn groups of letters in context

Non-standard consonant-vowel combinations in Tamil (source: Omniglot)

A big reason I found the statement that “Russian is hard because of its alphabet” to be so bizarre is that, compared to most other writing systems around the world, Cyrillic is pretty straightforward. The basic idea of Cyrillic is the same as that of the Latin script — consonants are letters, and vowels are also letters. “A” and “B” are the same kind of thing. Most other scripts get a bit fancier than that:

  • in Arabic, Hebrew and Syriac, consonants are letters, and vowels are usually not written at all (or in some cases — e.g. when they are long — they’ll be partially represented with the help of a consonant)
  • in most Indian and Southeast Asian scripts, consonants are letters, and vowels are diacritic marks added after/above/below/before/around those letters:
Devanagari: ka, kaa, ki, kii, ku, kuu, kr, krr, ke, kai, ko, and kau (source: Wikimedia Commons)
Tibetan vowel marks. (source: Ancient Scripts)
  • in Canadian Aboriginal syllabics, consonants are letters, and vowels are rotation operations on those letters
  • in Japanese kana, consonants and vowels are inseparable — there is no “k”, just unrelated symbols for “ka”, “ki”, “ku”, “ke” and “ko”
  • and of course, last but not least, Chinese characters generally don’t contain any straightforward phonetic information at all

There are a few borderline cases, too — the image at the top of this section shows a few examples from Tamil where the consonant letter and the vowel marking have blended together in slightly irregular ways. Similarly, if you look at the partial Amharic syllable-chart below, you’ll see that certain features (horizontal strokes, loops, lengthening of a leg, etc.) correspond roughly to particular vowels, but not always in a straightforward way (syllables starting with “r” and “y” are particularly tricky, for example).

A portion of the Ethiopic/Ge’ez syllabary (source: Omniglot)

Because of these irregularities, it feels like these writing systems are somewhere between full-on Japanese-style syllabaries, where there’s no separate element that represents the vowel, and scripts like Devanagari/Tibetan/Thai, where the vowel portion can be cleanly isolated from the consonant. So even though you can sort of analyze each symbol into its component parts, it still makes sense to learn the whole thing as a single unit.

Several of the alphabets I’ve learned recently have been like this — Telugu has several cases of irregular consonant-vowel mergers too — so I’ve gotten used to the idea of learning to read in larger “chunks” instead of always breaking everything down to the smallest identifiable unit. In fact, I guess that’s probably a reasonable approach to learning any new writing system. Letters behave differently in different contexts all the time:

  • Spanish “c” is pronounced differently in “cero” and “caro”
  • Russian has letters for “ya”, “ye”, “yu” and “yo” which “soften” the preceding consonant
  • “enough”, “through”, and “although” do not rhyme — the words have to be learned in their entirety and are more than the sums of their parts
  • etc. etc. etc.

After all, when you read your native language, you aren’t reading letter by letter either, but in chunks (so, like, Chinese characters and English words are basically the same thing to me, on some level). And if you want to really get the hang of learning a new writing system, it makes sense that you’ll also want to learn to recognize these sorts of chunks —letter combinations with special pronunciations, common words, suffixes, prefixes, grammatical particles… and so on.

Hmm, it almost sounds like I’m saying you should basically…

4. Just learn the language, at least a bit (if you somehow weren’t already planning to do that)

Even if you just want to be able to read signs when you travel somewhere… (Sign in Devanagari, Malayalam, Tamil, Kannada, Telugu and English from Wikimedia Commons)

I mean, I get it if you’re just trying to learn a new writing system on its own, and not learn an entire new language. Writing systems are fun and cool-looking and learning a whole language is a lot of work. Maybe you just want to read signs when you travel to another country or an ethnic neighborhood. Maybe you just want to write your name or keep a diary in a script no one can read. Whatever. I’ve done that sort of thing too.

For example, basically the only reason I wanted to learn the Armenian alphabet was so I could read random signs in Glendale during a trip to Los Angeles earlier this year. But Armenian is kind of a tricky alphabet with lots of similar-looking letters, uppercase letters that often look very different from their lowercase versions, and much less overlap with Latin than, say, Greek or Cyrillic, so I figured I’d need a lot of exposure to really get the hang of it. In the end, I decided to just get an Armenian textbook and skim through it, which gave me what I was looking for. I skimmed it a bit too quickly to retain much vocabulary and grammar in my long-term memory, but I wasn’t really trying to learn all that much vocabulary and grammar anyway. Now, if you showed me an Armenian letter in isolation without context, I might have trouble recognizing it, but if I look at a piece of text in Armenian I feel like my letter-recognition rate is much higher (although my comprehension rate is non-existent).

I did a similar thing with Amharic, which I also mostly wanted to learn just because of the script. And, as I’ve mentioned before, one of my initial reasons for studying Bengali was also just that the letters looked cool, although I’ve continued studying it since then.

And of course, if you were already planning to learn the language anyway, great! In that case, you don’t really have to get hung up on learning the whole writing system before learning the language either — for the most part you can just learn the relevant letters and letter combinations as you go along, probably with the help of a transliterated version of the text at first. (But make sure to keep Point #1 in mind, so you don’t mis-learn letters that you don’t realize are actually different.)

Come to think of it, all of these “principles” that I’ve just made up here (identifying relevant and stylistic features, relating new information to what you already know, learning larger units in context…) might as well be general language-learning tips themselves, just applied to the specific medium of visual symbols. I guess it’s all just the same thing.

Well, that’s it for this week — or for the past two weeks, I guess, since it took a while for me to get this article posted. I’ve been falling behind my one-post-per-week schedule a bit, partly because that I’ve been taking a bit of extra time to research my next article, which should be pretty interesting. So… stay tuned!

--

--