Character study: An excursion into the East Asian linguistic zone

Kevin Sun
Sun Language Theories
8 min readJul 12, 2020
Two examples from my CJKV mismatch collection in which Chinese, Japanese, Korean and Vietnamese all use different characters for a concept.

Since I got back from the Polyglot Conference in Fukuoka last fall and re-shuffled my language study priorities again, the main languages I’ve ended up focusing on have been Korean and Vietnamese, two languages which I’d never really studied in depth before. (Part of the reason I haven’t written a new post on this blog in months is that I wasn’t really sure what I had to say about these languages yet.)

A few weeks ago, I also decided to brush up on my Hindi/Urdu, which has been on the back-burner for a couple of years as I’ve been focusing on various other languages (Georgian, Japanese, Hebrew, Arabic, Turkish, Uyghur…). I found a new Russian-language Urdu textbook that I hadn’t used before, some books on Urdu poetry, and started watching Hindi shows on Netflix.

And then a funny thing happened: while trying to speak Hindustani (either just to myself, or with a conversation partner on iTalki), I’ve caught myself several times accidentally slipping in Korean words. Part of this is just a typical case of crossed wires in my brain, though I think the vaguely similar basic word order of the two languages (verbs at the end, postpositions…) is also to blame.

Korean and Hindi/Urdu are similar in another interesting way, too. In the same way that Hindustani (especially the Urdu version of it) shares in the broader Perso-Arabic/Turco-Persian literary culture that also encompasses Persian (duh), Punjabi, Pashto, Uzbek, Uyghur and various other Turkic and Indo-Aryan languages, Korean—along with Chinese (duh), Japanese and Vietnamese—is still heavily influenced by the traditional literary culture of China.

As a native speaker of Chinese myself, this means that my Korean studies have interestingly gotten somewhat easier as I’ve moved on to textbooks with “Advanced” or even “High Advanced” in their titles, because the share of hanja-eo/漢字語Chinese character-based words—in the new vocabulary I’m learning has increased.

The same thing happened while I was first studying Hindi and Urdu years ago (or Uyghur and Turkish more recently), as my prior study of Arabic and Persian made it easier for me to pick up the large amount of Perso-Arabic vocabulary in those languages. (And of course, a similar situation exists with the shared Greco-Latin vocabulary in European languages too, but I figure if you’re reading this blog you know all about that already.)

How I started learning Korean and Vietnamese

I started studying Korean last summer, shortly after getting back from my trip to Istanbul (and the Polyglot Gathering in Bratislava in the middle of that). The Fukuoka Polyglot Conference was just a few months away, and in addition to working on my Japanese I figured that would be a good opportunity to finally seriously work on Korean too.

It’s not as if I had zero prior experience with Korean, though. I had taught myself hangul in middle school and had made two or three abortive attempts to study the language since then, though I would always get distracted by some other language within a month or so. I even finished the Korean Duolingo tree in 2017, but didn’t retain much.

One thing that did stick with me from these previous attempts was the Korean pronunciations of dozens of, or maybe even a few hundred, Chinese characters, which was useful for understanding Chinese historical phonology and also other dialects. Meanwhile, my grasp of native Korean vocabulary, grammar, and the somewhat peculiar sound system was very weak.

Switching to Korean immediately after have spent several months focusing on Turkish seemed to make the transition smoother, since the languages have similar grammars (on a deeper level than the similarities with Hindustani I mentioned above), and working on it in parallel with Japanese also had some benefits. I also got to take a short two-day trip to Seoul right after the Polyglot Conference, which was great for exposure to the language in real-life situations, though I didn’t speak much yet at the time.

Vietnamese on the other hand came in a bit later. After getting back from Fukuoka, I shuffled my language plans a few times, including dabbling in Mongolian and Kyrgyz. But eventually I decided I wanted to do another Southeast Asian language (besides Indonesian), and the choice came down to Tagalog vs. Vietnamese. And I ended up picking Vietnamese because there was a course for it on Duolingo already.

Unlike with Korean, in the case of Vietnamese I really was coming in with a near-zero level of prior knowledge — I didn’t know the tones, I hardly knew a dozen words, and I had absolutely no idea how Chinese loanwords were pronounced. In fact I ended up being pleasantly surprised at how much Chinese-character based vocabulary is used in Vietnamese.

And that’s how I finally found myself in a position to take a closer look at connections between the four languages of the Chinese character cultural sphere.

Given that I’m a native speaker of Chinese and a prolific language-learner, I guess it does seem a bit odd that it took me so long to get to this point. The only explanation I can think of is that for a long time the other East Asian languages just seemed less “exotic” and less interesting to me precisely because of their connection to Chinese — which is how I ended up spending over a decade digging into the languages of Eastern Europe, Central Asia, the Middle East, South Asia, Africa and elsewhere before finally coming back around to East Asia.

In any case, now that I’ve finally started digging into this linguistic zone, I’ve been noticing plenty of interesting things.

Mismatches

Even though all these languages derived a substantial portion of their vocabularies from medieval Chinese, there’s been a good amount of divergence in the character and character combinations that are used for the same concepts.

I decided to map* some of these differences out using Observable, a visualization tool I’ve been learning to use recently: you can find the live code here.

(At one point I literally tried putting this on a map, but China is so large and Vietnam is so far south that I couldn’t really get the layout to work well. So this is a more abstract, symmetrical view.)

As you can tell from looking at these examples, one of the most common patterns is for Japanese and Korean to use one term and for Chinese and Vietnamese to use another. It’s also common for either Chinese or Vietnamese to be the odd one out.

These patterns reflect one of the major lexical developments in East Asia in the modern era: the invention of hundreds of new terms, based on Chinese characters, in Japan during the Meiji Restoration for the purposes of translating concepts imported from the West (democracy, communism, capital, bank…). Most of these were also passed on to Korean, many to Chinese, and a good amount to Vietnamese as well.

(One interesting example of transmission of a modern concept going the other way is the word for “geometry”, 幾何學, which was actually coined by a translator from the Shanghai area and then transmitted from China to Japan and Korea.)

In other cases, a different term would be coined in Chinese and passed on to Vietnam (e.g. “company”, which 公司 in Chinese and Vietnamese instead of the Japanese/Korean 會社). New words were also coined using Chinese parts independently in Korea (e.g. 酒煎子 for “kettle”) and Vietnam (e.g. 媒場 for “environment”). And there are interesting cases where the order of characters for a term is reversed in Chinese (and sometimes Vietnamese) compared to Japanese and Korean, like in “to introduce” (介紹/紹介), “language” (語言/言語) or “detective” (偵探/探偵).

Of course, this collection of mismatches leaves out a lot of other interesting aspects of so-called Sino-Xenic vocabulary in Japanese, Korean and Vietnamese. By focusing on examples where all four language use Chinese character-derived terms, I’ve left out lots of cases where Japanese, Korean or Vietnamese use a native word instead —although I’ve still squeezed in a few examples using Japanese kun’yomi or Vietnamese chữ Nôm.

There are other cases where there’s more than one Chinese-derived term in a language for a concept, some of which are unique and others of which are not (e.g. 葛藤/갈등 and 軋轢/알력 for “conflict” in Korean, which co-exist with terms like 紛爭/분쟁 and 衝突/충돌), so I couldn’t cleanly fit it into the format of the visualization.

And while the patterns in the mismatch collection might give the impression that most Sino-Korean vocabulary is related to Sino-Japanese, that’s not entirely accurate because there are so many cases where Korean uses a unique Chinese-derived term while Japanese uses a native word. Just to name a few: 只今/지금 (now), 親舊/친구 (friend), 亦是/역시 (indeed), 都大體/도대체 (after all (emphatic)), 始作/시작 (to start), 功夫/공부 (to study).

The other big omission in these diagrams is the internal differences within each language, both in terms of dialects (or sub-languages, really, in the case of Chinese) and in terms of different standards in the case of China/Taiwan and North/South Korea.

Vocabulary mismatches like the ones I’ve collected above aren’t uncommon in other linguistic zones, either. For example, while Arabic uses the word thaura for “revolution”, Persian and Urdu use the Arabic-origin word inqilaab instead, which mainly means “coup” in Arabic. And while ghareeb means “strange” in Arabic, it ended up meaning “poor” in Hindi/Urdu. (For an example from Europe, consider how déception means “disappointment” in French but something quite different in English.)

In the case of the East Asian languages, though, the historical use of Chinese characters adds another interesting angle to the analysis of these differences.

At the rate I’ve been updating this blog lately, I guess it might be several more months before I write another post, though that depends on what happens with my language studies in the next few months. For now, I expect I’ll still be focusing on Korean for a bit longer (until I finish going through my advanced textbooks, at least) while also looking for new Vietnamese learning material to get me out of the plateau I’m currently on.

After that, who knows —now that I’m brushing up on Hindi/Urdu again, I might end up making another push into the languages of India again, possibly reviewing Bengali or trying to get more into Tamil. I’ve also started doing Finnish on Duolingo just for the hell of it, though I doubt I’ll continue studying it in depth after I finish the course. Other languages that I might try to look into next include: Thai, Tagalog, Mongolian, Amharic, Hausa…. or I might finally just take some time to “focus” on the languages I’ve already studied instead of constantly branching out further.

Anyway, thanks for reading, and stay tuned for the next post, whenever that is!

--

--