Building an efficient Shinjitai (新字体) to Kyūjitai (舊字體) converter by taking a deeper look at the Kakikae Japanese script reform

Emmanuel Ternon
6 min readSep 4, 2019

--

In 1946, the Japanese government released the Tōyō Kanji list (当用漢字表, lit. “list of kanji for general use”), an official list of 1,850 Chinese characters (called Kanji in Japanese), whose objective was to limit the total number of Kanji used in the official writing system of the Japanese language, hoping to make it easier to teach in the public education system so as to increase the literacy rate in the country.

The promulgation of the Tōyō Kanji list had a major impact on the way Japanese is written since it introduced simplified forms of Kanji, called Shinjitai (新字体, lit. “new character forms”), replacing their pre-WW2 traditional equivalents, now referred to as Kyūjitai (舊字體, lit. “old character forms”).

Many Japanese people (and advanced learners of Japanese) are aware of this fact, and have attempted to build simple Shinjitai to Kyūjitai converters (such as this one), which revert the simplified forms of the Tōyō Kanji list to their traditional counterparts. Such converters are relatively easy to build, since the transition from Kyūjitai to Shinjitai within the scope of the Tōyō Kanji list only affected individual characters, with a nearly one-to-one mapping between each Kyūjitai-Shinjitai pair, the only exception being the Shinjitai character (弁), which is the simplified form of four distinct Kyūjitai characters (弁, 辯, 辨 and 瓣).

One major issue with the Tōyō Kanji list is that a large number of words in the Japanese lexicon were written using Kanji which were not included in this list. To solve this problem, the Japanese government carried out another reform in 1956, called 同音の漢字による書きかえ (Dōon no kanji ni yoru kakikae, lit. “rewriting according to Kanji having the same sound”, called Kakikae for short in the rest of this post). This reform made it possible to write Japanese only using Kanji from the Tōyō Kanji list, by applying the following rules:

Rule 1: words which had multiple spellings before the reform were required to be written with the spelling variant using only characters from the Tōyō Kanji list.

Examples:

  • The word chūmon (“to order, e.g. a drink or food”) could either be written 注文 or 註文, but the reform standardized its spelling to 注文
  • The word iseki (“historical ruins, remains”) could either be written 遺跡 or 遺蹟, but the reform standardized its spelling to 遺跡
  • The word kiga (“starvation, famine”) could either be written 飢餓 or 饑餓, but the reform standardized its spelling to 飢餓
  • The word seigyo (“to manage, to govern”) could either be written 制御, 制禦 or 制馭, but the reform standardized its spelling to 制御

Rule 2: the spelling of some words was changed so that Kanji in these words that were not in the Tōyō Kanji list were replaced by phonetic substitutes from the Tōyō Kanji list.

Examples:

  • The spelling of the word bōgyo (“defence”) was changed from 防禦 to 防御
  • The spelling of the word taifū (“typhoon”) was changed from 颱風 to 台風

While the changes resulting from rule 1 did not critically affect the way Japanese was written before the reform (it simply standardized the spelling of the affected words), the changes resulting from rule 2 are problematic, because the Kanji used as phonetic substitutes are semantically unrelated to the original characters. As a result, these changes can be considered Kanji simplifications, and should ideally be reverted by Shinjitai to Kyūjitai converters. The problem is that most of these converters do not take the Kakikae reform into consideration, and are therefore not able to properly convert words affected by this reform back to their original forms. They would for instance not be able to properly convert the word 防御 to 防禦, and would convert the word 台風 to 臺風 instead of 颱風, the result of converting each character individually (because 臺 is the Kyūjitai form of 台).

One notable exception is the kyujitai.js converter, which uses a quite complete list of words affected by the Kakikae reform, and is hence able to convert all words affected by this reform back to their original form (e.g. 防御 back to 防禦, and 台風 back to 颱風).

However, there is one major problem with kyujitai.js, namely the fact that it does not differentiate between words whose spelling was changed according to rule 1 (keeping only one of the multiple possible spellings of the word) or according to rule 2 (replacing some Kanji by phonetic equivalents). As a result, it systematically converts all words affected by rule 1 to their alternative pre-reform forms, such as 遺跡 or 飢餓 (which are perfectly acceptable Kyūjitai forms) to their alternative forms 遺蹟 and 饑餓, respectively. It even goes as far as converting some common words to their rather uncommon alternative forms, one example being 以上, converted to 已上 (!)

An ideal Shinjitai to Kyūjitai converter would then be able to accomplish the following tasks:

  • For Shinjitai → Kyūjitai conversion, convert all Shinjitai characters from the Tōyō Kanji list back to their original Kyūjitai forms, and convert all words affected by rule 2 of the Kakikae reform back to their pre-reform forms
  • For Kyūjitai → Shinjitai conversion, convert all Kyujitai characters from the Tōyō Kanji list to their Shinjitai forms, and convert all words affected by the Kakikae reform to their post-reform forms (rule 1 and 2).

To build such a converter, one would need to know which Kanji changes from the Kakikae reform are the result of rule 1, and which are the result of rule 2. Unfortunately, I have not been able to find this information anywhere. Even the official document from the Japanese government explaining the scope of the Kakikae reform only gives an list of words affected by this reform, without stating whether the spelling of these words were changed by keeping only one of its multiple possible spellings (rule 1) or by replacing some Kanji by phonetic equivalents (rule 2).

There is, however, a way to obtain an approximation of the list of words belonging to category 1 (words affected by rule 1) and category 2 (words affected by rule 2) by looking at Chinese and Korean: if the post-reform spelling of a Japanese word is also used in Chinese (in traditional characters, of course) and/or in Korean, then it most likely a variant spelling that existing before the reform, i.e. this word should belong to category 1. Otherwise, the post-reform spelling is most likely the fruit of a phonetic substitution, i.e. this word should belong to category 2.

To perform this approximation, I used the dictionary database of the Chinese/Japanese/Korean/Vietnamese dictionary I built, CJKV Dict, which basically uses:

I then applied the following simple algorithm to the list of words affected by the Kakikae reform from the kyujitai.js project:

  • If the post-reform spelling of a word is used in Chinese and in Korean, then this word belongs to category 1.
  • Else, if the pre-reform spelling of a word is used in Chinese and in Korean, then this word belongs to category 2.
  • Else, if the post-reform spelling of a word is used either in Chinese or in Korean, then this word belongs to category 1.
  • Else, if the pre-reform spelling of a word is used either in Chinese or in Korean, then this word belongs to category 2.
  • Otherwise, the post-reform spelling is most probably a variant, so the word is deemed to belong to category 1.

Based on this approximation, I created a Shinjitai to Kyujitai converter able to properly convert words affected by the Kakikae reform back to their original forms, only when the changes generated by the Kakikae reform are actual simplifications. This tool is by no means perfect, but its results are quite satisfying.

I also made this converter available as a Python library for those of you who wish to use them as part of their Python projects. Enjoy!

--

--