Emojis and Unicode 💩
How and why you can include emojis in just about anything.
When I was working on my personal website, I learnt that I could use emojis in my text like so
<p>💻 Projects</p>. I was floored — I thought emojis were images or icons. I’m a nerd 🤓 — I couldn’t just accept this fact and move on with my life. I wanted to know why emojis are special and I decided to do some research 👩💻.
So why does this work? 🤔
Since 2010, emojis have been included into Unicode. With every version release, the Unicode Consortium (fancy word for committee) continues to add more emojis 🙀 😱 😍.
But what is Unicode? 😕
Computers fundamentally deal with binary numbers, a series of 1⃣️’s and 0⃣️’s — a combination of bits. They don’t deal with text or characters the same way humans do.
To use bits to represent anything more than just bits, we need rules. These rules are also known as encoding schemes, or encoding for short. We use them to convert a sequence of bits into something like letters, numbers and pictures and vice versa. There are many different ways to encode a character and they differ in efficiency and compatibility ☠.
That’s where Unicode comes in. Prior to it, there were many other encoding systems. However, Unicode is a standardized character set — a universal way of translating characters into bits a computer can understand. This set includes a majority of characters from the human language (Mandarin, English, Russian, etc). Since emojis have been included in Unicode, computers can interpret emojis the same way it does regular characters 🎉🎉🎉.
But Unicode is not an encoding. It is simple a table that maps values to characters. It’s a fancy way of saying: “65 is A, 66 is B, 67 is C, and 9,729 stands for ☁”.
So, Unicode needs an encoding to translate characters to bits. The encoding they use is UTF, or Unicode Transformation Format. There are three encoding forms: UTF-8, UTF-16, and UTF-32. Although the most common one on the web is UTF-8. For details on the differences check out this post.
In short: Human Characters ➡️ Unicode Character Sets ➡️ Encoded Character Sets using UTF ➡️ Bits that computers can understand (and vice versa 🔄).