So Where is Kln-sna?
Like it or not, our Internet was created to be English-language focused. As the Internet and Web were being developed there was only one core character set: ASCII. Most of the RFCs defined a character set that was based around ASCII [here], and which only support a limited number of characters. With Unicode [here] we have 16 bits to represent characters, and can then represent almost every character we need.
Representing Unicode in URLs
Our URL infrastructure for domain names is often focused on the ASCII character set. To overcome this Punycode is used to encode Unicode into ASCII characters. It does this with a Letter-Digit-Hyphen (LDH) subset, and where we define the Unicode characters after a hyphen. So let’s try some German city names [Try]:
const punycode = require(‘punycode’);rtn=punycode.encode(‘München’);
console.log(rtn);
rtn=punycode.encode(‘Köln’);
console.log(rtn);
rtn=punycode.encode(‘Düsseldorf’);
console.log(rtn);
The results are:
Mnchen-b078a
Kln-5t7s
Dsseldorf-g674c
If we look at “München”, then we get “Mnchen-b078a”. The hyphen represents the additional characters, and which are encoded using generalized variable-length integers. If we now try “點看” [here] we get:
Message: Dian Kan
Encode: c1yn36f