Unicode — from MOJIBAKE to EMOJI 💩

Pasan Nissanka
4 min readJan 9, 2018

MOJIBAKE (In Japanese) means scrambled text

New emojis in iOS 11.1

Computers deal with numbers. They store letters by assigning a number to each character or letter.

Before Unicode consortium formed most character encoding systems, were limited to 8 bits and most countries used their own encoding systems for their languages. Problem was if you opened a text file written in Latvian on a Turkish computer, the result was completely incomprehensible.

The UTF-8-encoded Japanese Wikipedia article for mojibake, as displayed if interpreted as Windows-1252 encoding.

This problem got worse as the rise of computing in asia. Languages like Chinese had more than hundreds of characters, so in response countries began developing multi-byte encoding systems. These caused risk of data corruption and incompatibility with different Platforms. Japanese had a special name for this issue MOJIBAKE means “scrambled text”.

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

Unicode Logo

In 1991, Unicode Consortium was formed. Its job was to workout how every country, every language encode text into binary and turn all of these hundreds of approaches into one worldwide standard. It provided a unique number for every character, throughout every platform, device, application or language. The most common Unicode version today use 16 bits, with space over a million (1,114,112) codes. Enough for every character from every language ever used and most importantly EMOJI! 💩

Where did the emojis come from?
As by the name, you guessed it Japan!

According to Verge The first emoji was created in 1999 by Shigetaka Kurita for NTT Docomo’s I-mode internet platform. Kurita took inspirations from weather forecasts that show symbols, Chinese characters, facial expressions. Emoji were initially used by NTT Docomo, au and SoftBank mobile.

Shigetaka Kurita — Image source HidustanTimes

There was some competing versions of emoji, and in coming years Japan become more attached to emoji text features offered by Japanese made phones. Yet emoji never left japan until Unicode Consortium came along.

In 2008 Apple began selling iPhone 3G in japan, but the iPhone 3G wasn’t anywhere near what Japanese users wanted. SMS Chat interface made no sense to everyone who used push mobile email, Camera couldn’t focus QR codes, no NFC for mobile payments most critically, there weren’t emojis.

iPhone 3G

So Apple launched iOS 2.2 update that November with character set for Japanese users along with emoji. From 2008 to 2017 Apple took 60% of smartphone market share in japan.

The Unicode Consortium had been codifying emoji since 2007. Google and Apple were pushed hard for encoding of emoji in Unicode in order to solve incompatibility issues between various vendors. Finally in 2010 Unicode Consortium introduced Emoji in Unicode 6.0.0 release and set to standardized across different platforms.

“Apple was the first major international phone maker to add compatible characters to its own software, and it’s impossible to imagine the subsequent global phenomenon taking place without the iPhone.”-Verge

Emoji in iOS 5

In iOS 5, which came out 2011 October, Apple officially enable emoji characters in iPhones around the world. Apple was the first major international phone maker to add compatible characters to its own software, and it’s impossible to imagine the subsequent global phenomenon taking place without the iPhone.

Emoji in iOS 11.1

In the Unicode 6 release there was only 75 emojis,
But today (January 2018) there are approximately 2,666 emojis in total including flags of countries, gender variations, skin tones, animals, food, sports, cars, professions… you name it! 😉

Source

Unicode®
Unicode-Wikipedia
How emoji conquered the world
Japan Teens Flip for Private Pagers — NYTimes
Emoji-Wikipedia
•Unicode® 6.0.0
•How the iPhone won over Japan and gave the world emoji
Emojipedia
The Art of the Bodge: How I Made The Emoji Keyboard
What’s new in Unicode 6.0

--

--