Why Don’t People Code in Chinese?

Tom Z. Jiahao
12 min readOct 22, 2021

--

Biang, allegedly the Chinese character with the most number of strokes. It is exclusively used in the name of a kind of noodle in Shaanxi Province, China

Chinese characters are perhaps one of the oldest written forms still in use today. When the characters were first invented, they were logograms resembling the shapes and forms of things they were used to describe. Over time, the characters have evolved to be written more easily, and most of the original logographic features have become unrecognizable in the modern written Chinese. I am a computer science major, but as someone who grew up learning Chinese as my first language, I always wonder why the Chinese language has not been used as the basis for one of the mainstream programming languages, especially given its long existence in human history. Just a disclaimer, I am no linguist or programming language expert, so I can only draw on my own experience with the Chinese language and my daily programming tasks to find some clues as to why people don’t code in Chinese.

The evolution of the Chinese character for water, written by author

The Chinese language is difficult. While there are over 50000 Chinese characters in total, it suffices for one to know around 3000–5000 characters to survive in Chinese speaking countries. Although this might not sound like a lot compared to the number of English words one needs to know to pass the SAT, the sheer complexity in the strokes and structures of this written form may deter many from even starting to learn Chinese. But let’s set aside this difficulty, and assume that everybody miraculously knows Chinese in addition to their mother tongues. Doing so I choose to focus more on the features of the language itself.

Understandably, there is also the debate on the role of history in the adoption of languages for programming. After all, many of the progenitors in computing, modern computers, and programming languages spoke English. Alan Turing (father of theoretical computer science and AI) and Charles Babbage (father of the computer) are English men; the C programming language was created at Bell Labs by Dennis Ritchie and others, mostly Americans who spoke English. Nevertheless, besides history, it’s curious to think about if there is something inherent about the Chinese language itself that prohibits it from being used for programming. So let’s just assume that the effect of history is somehow minimal in dictating which language is used profusely for coding. I will use the terms “programming language” (e.g. C, Python) and “natural language” (e.g. Chinese, English) for disambiguation, and frequently compare Chinese to English to illustrate my points.

What language features are good for programming?

Although I have been programming for years, it’s a fun exercise to ponder over this question (Try it if you are a programmer or computer scientist) from a pragmatic point of view. I came up with the following 5 features which are important for a language to be considered suitable for programming. You may add more or contend with some of them if you wish.

  1. Ability in expressing logic: Programming languages implement computers algorithms and/or instructions, which are basically logics and rules. So for a natural language to be used for coding, it should at least have the capability to express logic in a succinct and rigorous manner.
  2. Ease of input: This is the most pragmatic concern. Speed is everything nowadays, but we programmers are limited by the devices for inputting codes into a computer. By devices, I mean both our corporal devices such as our hands as well as external devices such as the keyboard and mouse. While I do believe our input efficiency will be greatly enhanced by emerging brain-computer-interface technologies in the near future, I will focus on the technologies we have right now.
  3. Modularity: This might not be a legit notion in linguistics so I will define it myself. By modularity, I mean the ease of composing new words or phrases which can exist inside our code. When we code, we usually have a sprawling set of variables, pointers, and keywords as the program grows. So the natural language should afford the ability to form new words with flexibility, so as to meet our needs for naming new variables.
  4. Communicatability: I simply made up a big word here but communicatability essentially means the efficiency to be communicated. Most of us programmers collaborate in teams, and therefore it’s important that the code we write can be communicated efficiently across. Although all natural languages are good for communication because they emerged from human’s need to band together for survival, but when used to write code, it’s unclear if their communicatability is still preserved.
  5. Culture: This feature might be a little contentious since the word culture can be vague, and is not easily dissociable from natural languages. Furthermore, it can get tangled up with all the historical complications. Nevertheless, I think as a native speaker, I should talk about them, especially when compared to English.

Ability in Expressing Logic

Many, like this post on Quora argue that English is more logical and precise than Chinese in general. I agree to some extent. In my daily use cases, Chinese is a very contextual language and the ordering of words in a sentence is usually not as strict as in English. I will quote the famous poem “床前明月光,疑是地上霜。” which should translate to “The moon shines on my bed brightly, So that I mistook it for frost on the ground.” But if we analyze the words in the poem, they literally mean “Before bed bright moon light, doubt is floor top frost.” The Chinese language focuses more on the impression that each word leaves on the sentence, and therefore is more prone to ambiguity. However, this is not the point here. Being used as a programming language is different from being used as a natural language, and we should focus on Chinese’s capability to express logic. In other words, the real question is can we use Chinese in a way that it can express logic in a succinct and rigorous manner? And I think the answer is yes.

If we assume the English language is sufficient for expressing all the logics we need for coding, then the Chinese language is too. In terms of logical expression, I am convinced that there is a complete bijective mapping between the two languages. Think about it, every control flow keyword in English has a corresponding word in Chinese, and every algorithm written in English can be equivalently described in Chinese with the same rigor. There is even an esoteric programming language that uses classical Chinese as its basis. Fundamentally, I believe that the human capacity in logical thinking is roughly the same for every culture and our languages have evolved to enable our expression of logics.

Ease of Input

Finding the character for “Oyster” using the built-in Windows Chinese input method. I had to scroll down two pages using the next page button. There is also a good chance I miss it if I am not careful enough.

It is harder to input Chinese into a computer than English, Period. My friends who use traditional Chinese have an entirely different input method which I have no knowledge about, so I will focus on the most popular input method for simplified Chinese, which is typed via Pinyin. Pinyin is the official romanization system for standard Mandarin Chinese. Pinyin basically spells the sound of each Chinese character by using English letters. This input method is indirect and has many problems. The biggest problem is that Chinese is monosyllabic and each sound is associated with many different characters. So even after you type the Pinyin of a Chinese word, you need to scroll through a long list of words and choose the one you wish to type. This sounds extremely painstaking but thanks to modern software, such as Sogou Pinyin (which has an astonishing 83.6% adoption rate in China), the users’ habits and the contexts of words are taken into consideration when the software is used over time. The most likely candidates will show up first when you type your Pinyin, significantly reducing the time to scroll through the list of words. Sogou also allows you to directly type the Pinyin of multiple characters for forming a phrase, which will further narrows down the possibilities. But for coding, habits are more about the form and style, and much less about which character one wishes to use. Things like picking variable names have little to do with habits but rather about the needs in that specific coding context. It’s unclear how an input software can optimize coding speed when using Pinyin.

The character Zhao (left) and its radicals circled (center). If we randomly rearrange the radicals (right), the word looks legit but it does not exist and cannot be written as code. Written by author.

So you may argue that our current keyboard is optimized for English, and there may be a better design to input Chinese. This actually brings us to another Chinese input method Wubi (which has lost its shine since Sogou took off), which uses radicals to index Chinese words. Radicals are the components of Chinese characters and there are only a finite number of them. Using radicals to type Chinese seems like a more direct way to input because letters to English is like radicals to Chinese. However, while there are only 26 English letters, there are a total of 214 radicals in the Chinese language. How can a keyboard have 214 keys but still allow us to type fast? This keyboard design challenge may just be impossible because after all we only have ten fingers. What Wubi did was that it crammed multiple radicals onto a single key, but this made it extremely hard to master. Its complex system combined with the need for much memorization and practice resulted in a steep learning curve, which might just explain its unpopularity. So what if we streamline the number of radicals on a keyboard? Then we are back to the scrolling business because each radical is associated with multiple characters which we need to waddle through.

Modularity

I can type the word atbk in English and it will show up in this post as it is (btw this word is entirely made up and doesn’t exist in the dictionary). However, there is no way an incorrect Chinese character can even exist in the computer. This example illustrates what I mean by modularity: the ease of composing new words which can exist inside a computer. While it’s unlikely that we use many made-up words in our code (that will be a bad practice), modularity shows the flexibility of the natural language when used for coding. I will give a few examples when this can be really helpful. First, it allows for abbreviations. I am sure many of us have used variable names like “temp” in our code. It is not an actual word but being able to write it down saves us time and offers a great deal of convenience. The second example is when we want to augment the name of a variable to indicate some caveats. For example, if I have a variable named “apple,” I could also write “apple_a” or “apple_temp.” Here the suffixes “a” and “temp” are not actual words but they can conveniently augment the variable name.

While you can’t make up Chinese characters on a computer, Chinese does allow for composing new phrases consist of multiple characters. For example, I can type “凉像,” which doesn’t mean anything in Chinese but it can still exist here in this post. However, this composition requires more overhead, as I need to scroll twice to find the right characters to compose this phrase. If I am to augment any variables using Chinese characters, the only thing I can do is to append an entire character or phrase to that variable since there is no shorthand. Again, this means more overhead. Overall, Chinese only allows compositions on the high level: the characters are what they are and you can’t change them. This lack of modularity makes Chinese less flexible than English. Furthermore, if typing the characters is already slow, any composition will only make Chinese even harder to be used for coding.

Communicatability

Communicatability is not about how to transfer code from one person to another because that’s already taken care of by GitHub. This post explains why English is the Lingua Franca of Programming. It emphasizes how programs can be easily communicated using English. If we ignore the historical aspect which made English the Lingua Franca in most countries, and assume everybody knows Chinese alongside their own mother tongues, would they be able to communicate their code efficiently were they to be written in Chinese? I am definitely making some big assumptions here so my conclusions might just well be wild wild guesses.

When it comes to reading, it’s a statistical tie between Chinese and English. According to this article, on average, English is read at 382 words per minute and Chinese at of 386 words per minute. So if someone is to share his code to a colleague, it shouldn’t be read slower if the code is in Chinese.

What if someone is to talk about a piece of code verbally? I actually think it’s easier in Chinese. According to this linguist’s answer, “Chinese has a fewer number of the minimum sound units which carry meaning, or morphemes. This means that for each of what would correspond to a syllable in English, there is a smaller number.” I always like to challenge my English-speaking friends to count to twenty as fast as they can, and then I will show them that I can count about twice as fast in Chinese (I often get very surprised faces and they think I am cheating). In Chinese, numbers from zero to ten are all single syllables, which already save so much time. Furthermore, the numbers above 10 and below 100 are often counted as concatenation of two single-digit numbers, i.e. twenty-one in Chinese can simply be counted as two-one. This means for any number below 100, at most two syllables are needed to pronounce. Of course, code is not all about numbers, but even for phrases and sentences, Chinese is much more condensed and can express the same meaning in fewer words. Just imagine two colleagues talk about recursion in Chinese, and it’s simply “递归,” which already gives a 33% reduction in the number of syllables when spoken.

Culture

Photos by Amanda Lim and Bruna Branco on Unsplash

I like to use an analogy between natural languages and food. The Chinese language is like Chinese food while the English language is like western food. If you visit China, you will find that most dishes are just what they are, with little or no room for customization. Dishes like “麻婆豆腐(Mapo Tofu),” and “宫保鸡丁(Kung Pao Chicken)” may have many variations but for each variation, the recipes are fixed. If you ask the dishes to be changed somehow, the chef might even get offended. In fact lots of Asian food work the same way. When Gordon Ramsay made his own version of pad thai in front of a Thai chef, he thought his version is delicious but he got blasted by the Thai chef. The Thai chef told Ramsay that “This is not pad thai at all. Pad thai has to be sweet, sour, and salty.” Indeed, each dish is a signature, just like how you can’t anyhow add a stroke or radical to a Chinese character and let it exist.

Western food, however, affords a great deal of customizability. When I order a burger, I can ask to have more onion, or to have the cheddar cheese removed; when I order some pasta, I can choose whether it’s linguine or spaghetti. It just seems that the menu only serves as a baseline, but in fact there exists an infinite number of combinations using whatever ingredients available. This is so much like the English language, with which I wrote down my imagined word atbk and it just exists fine.

If you ask me which kind of food I prefer, I would say both. I would enjoy my daily afternoon cup of latte, with triple shot, oat milk, no sugar, and no cream, but I would also not let anyone tamper with my “肉夹馍.” However, when it comes to coding, I would say having more flexibility is definitely more desirable.

Recently I’ve been talking to a friend who is a very smooth bilingual and just happens to be a Chinese philosophy aficionado. He said something that really struck me. “To understand Chinese philosophy, you can’t just rely on the bare words. You need to 悟.” The word 悟 in Buddhism means awakening, which is a deeper experience than simply understanding or comprehending. That’s how I feel about the Chinese language too: it has been used to summarize, distill, and elevate the human experience for thousands of years. The result is a language system that is extremely top-down, dense, and requires 悟 to bring to light its hidden complexities. On the other hand, the downside of such a language system is well summarized by the first prime minister of Singapore Mr. Lee Kuan Yew, a multilingual himself. He said in an interview, “…(Chinese is) a language that shapes thinking through epigrams and 4000 years of texts that suggest everything worth saying has already been said, and said better by earlier writers…”

So is Chinese good for coding? Considering all the aforementioned features, I don’t think so. Computer programs are fundamentally bottom-up constructions. To be bottom-up, you need a natural language which is amenable to tweaks and tricks to satisfy the most nuanced needs of the programmer. English happens to be such a language, and maybe it’s not that surprising it has emerged as the programming lingua franca.

--

--