Learning a New Alphabet

Pardon, not about #peakVC. This is about human language.

Learning a new script is something I do only once per decade. Which one am I learning? See the photo at the end, and guess.

Learning a new script is humbling, like being a schoolchild, and inspires thoughts and questions (somewhat in the spirit of Chris Dixon’s Looking Forward Looking Back) about technology and about how humans learn:

New alphabets are far easier to master than new languages, but we learn as many or more languages as new alphabets, with some notable exceptions.*

The better we write a script, the worse (more illegibly) we actually write it.

The more we read a script, the less we actually read characters. (Microsoft Research explained this in depth some years ago). Reading (or OCRing) cursive handwriting requires a language model as much as image recognition. It is not easy for me to identify the Latin characters in handwritten cursive Finnish, a language I do not know.

Print, handwriting, uppercase and lowercase are really four distinct alphabets (for the alphabets I know; most major scripts have no casing anyway). For example I learnt printed Cyrillic a decade ago but the handwriting is still cryptic to me.

Most scripts have influences in common with many other scripts. (eg Latin, Cyrillic, Greek and Armenian can be learnt passively if you know one of the others. Even Egyptian hieroglyphics are a distant influence on the Latin alphabet.)

Accordingly, I have immense respect for people who have learnt to write Chinese. Learning to speak and write Chinese as a foreigner is about as hard as learning 10 of the languages or scripts that I have learnt.

We curse always les faux amis, and I am the first one to say that it’s not what you don’t know, it’s what you do know that ain’t so. But in this case I believe false friends still help. (Cyrillic р is r not p, н is n not h, в is v not b, у is u not y. I wonder how long it would take to learn to read a Caeser cypher.)

Diacritics and other little strokes that are challenges in software were really natural in handwriting. Even casing and spacing and hyphens — i.e. legitimate ASCII — are challenges in software, actually.

An alphabet is not equally suited for the phonology of every language.

No script is perfectly phonetic. And we wouldn’t want it so anyway.

Latin characters have well-known names in many languages (e.g. “igrek” for y), but in the Latin-script world really only a few Greek characters approach this status.

How is gift or skill for learning language correlated with gift or skill for learning a script?

Learning a new script is satisfying much like learning a new language.

Children are better at learning a language, but not any better at learning a script.

Likewise, humans are better in some ways than machines at learning language, but not really in learning scripts.

When will the next significantly new significant alphabet be invented? Or is it too late to ever overcome network effect?

Absent better understanding, I default to the Lindy effect, or as I think of it, Taleb’s theory of half-life.

Scripts, even more so than languages, are endorsed, rejected and invented by government regimes. In the 1990s there were plenty of countries where the regime officially changed the script. Yet scripts, including this one, outlive both languages and regimes.

So, which script am I learning?

The most likely response to this is Which script is that? or Which language is that? as popular tools for this question with this input do not exist. You can’t use Google Translate’s Detect Language without text, but you can’t get text via OCR without knowing the language.

That said, language identification is a solved problem, and not so hard whether written, spoken or handwritten, compared to actual OCR, translation etc.

Can scripts, like languages, offer a form of privacy via weak encryption? More on that later.

* Many people who speak only a language with a non-Latin script know the Latin alphabet in addition to their own. And there are the places with a greater density of unique scripts, like South Asia and the Caucasus. That said, there are many people who never learn the proper script of a language they learnt as children.