Emoji: Setting the Tables

Making Faces (and Other Emoji)

Hey all! I’m Colin (@colinmford, here and on twitter), a typeface designer. I write short posts about the history, design, and technical details of emoji from a type designer’s perspective. You can find the other posts here. I also run workshops occasionally that help you design and build your own emoji from scratch.

The last article I posted, a few weeks ago, discussed some design decisions about emoji. This post will lay some foundation regarding some of the technical history of emoji fonts. The next post will discuss specific OpenType tables in-depth, as well as a way to get them on the web while support for emoji coalesces, and emoji ligatures! Let’s get started with the first part.


Emoji, since their earliest days, have been fonts. They’re images, situated just so that they can work easily alongside type. This is what got me into emoji in the first place — it’s sorta strange that they’re fonts, philosophically but also technically. I wanted to know what these colorful things were doing in my black-and-white font files.

I’m a type designer, as I’ve said. Before I worked at Hoefler & Co, I went to grad school for type design, the Type and Media program at KABK. In addition to teaching me how to design type, they (Erik, Just, Paul, Peter and Petr) taught me how to make fonts (by which I mean the digital manifestation of a typeface). The technical aspects of type are interesting to me. They’re weird little pieces of software. In addition to designing type at Hoefler & Co, I do a lot of “mastering” or “manufacturing” — making sure all the bits of the font files are up to our standards so they work correctly on your computer.

What I’m trying to get at is that even for someone like me with a little experience making font files, making emoji is surprisingly tricky. I’m going to try to explain all the basic parts first so you can understand what is going on behind the scenes when you finally make an emoji font.

Unicode

I suppose the best place to start is that the bedrock that underlies most communication in the modern age, Unicode. As I’ve mentioned before in broad terms, the Unicode Consortium is a group of companies and individuals that decides upon names and numbers for virtually every character necessary to communicate and written language. This body produces the Unicode Standard. The vast majority of the Unicode Standard is made up of characters from scripts that you’re used to: the Latin script (the one that you’re seeing right now!), Cyrillic, Arabic, Chinese. But Unicode also contains historic scripts, rare scripts, and scripts that you might not have heard of but have millions of writers, like Glagolitic, Cherokee, and Kannada. Unicode tries to support them all.

The Unicode Standard does this by designating a spot (a “code point”) and a name to each character. For instance “A” has the hexadecimal number “0021” and the name “LATIN CAPITAL LETTER A”, what seems like an over-specific name until you see some of the others. As we’ve discussed before, though, they don’t make any stipulations beyond that — you just have to put a letter that you intend to be read as “A” in that spot, no matter what that “A” looks like. You just can’t put something you intend to be read as a “B” in the “0021” spot.

There are more than 120,000 glyphs in Unicode all told. A small portion of the supported characters are symbols, arrows, and other paraphernalia, like game symbols. An even smaller part of that is considered to be emoji.

Emoji vs Non-Emoji. By Mark Davis on Unicode’s Blog

With the attention that emoji bring to the Unicode Consortium, you would think the only thing they do is dream up emoji. I’m sure there’s even a few people out there that thinks Unicode makes all the emoji. The truth is that discussing which characters of the northern Chinese Tangut script are going to make it in to Unicode 9.0 isn’t sexy. But still, there are rumblings from within and without the Consortium that too much time is being spent on emoji.

(If you want a more in-depth article on Unicode, check out this one on the Alphabettes blog!)

Encodings

The Unicode Standard isn’t perfect or complete, but it is a whole lot better than what type designers (and emoji designers for that matter) had to deal with. The most famous of the pre-Unicode encodings is ASCII — American Standard Code for Information Interchange — probably made most famous by the term “ASCII Art”. ASCII (pronounced ASK-ee) is a standard of originally just 128 characters, produced by Americans in the 1960s for the American telegraph industry. It was added on to and “forked” for many years. It became the most widely used standard for encoding text in early computers, and remained so until 2007 when it was replaced by Unicode. Needless to say, since it was conceived at a time when computers were still using punch cards, and developed first and foremost for American English, it was not well-suited for the modern, multilingual internet.

An original code chart for ASCII, note all the non-letter “control characters” in the first two columns. Teletype machines used these characters to perform special functions. These are all still in Unicode! Wikimedia.

Its cousin across the Pacific is called Shift JIS (Japanese Industrial Standards). The corporation that oversees ASCII and Microsoft developed it jointly in the 90’s, and it’s essentially an extension of ASCII. The original DoCoMo emoji existed in an unused area of Shift JIS, which limited their number to under 200. SoftBank’s were transmitted with “Shift Out” and “Shift In” control characters — originally included in ASCII so one could remotely change the color of the typewriter ribbon when transmitting messages to a teleprinter. Whereas ASCII had ASCII art, Shift JIS had Shift JIS art. Whereas ASCII had emoticons, Shift JIS had more complex kaomoji.

Some Emoticons (Left) and Kaomoji (Right).

But I digress. My point is that the world of type is filled with this sort of thing — competing standards, extent since the early days of desktop computing, updated and expanded as-needed. For instance, the font formats themselves. The history of font formats warrants its own post, as it’s filled with shifting alliances, hacking and reverse engineering, and historical dead ends that could have radically changed the way we use fonts today. The major players in the 90’s were PostScript and TrueType. PostScript was loved by type designers because it was easy to design in. It might be said that TrueType was loved by programmers because it was an easily extensible file format. Things calmed down a little bit with the invention of OpenType (which essentially allows PostScript outlines inside of the TrueType file format).

OpenType Tables

OpenType fonts are “binary” files; made up of long series of 1’s and 0’s, efficiently read by computers, but not readable by humans. That is—unless you have the key to the code. That key is called the OpenType specification. In a broad sense, a specification is the order of the 1’s and 0’s, agreed-upon amongst developers. To make it easier to parse, the OpenType file is split up into tables, which are sections in that order. For instance, there’s a name table that contains strings of text that are used to describe how the font’s name appears in your operating system. There are tables that describe the order of the glyphs in the font (“cmap”), contain kerning data (“kern”), hinting information (“prep”), and many other things. All tables have 4-letter names, and can sometimes be quite cryptic. You can include or leave out any of these tables that will, or even create your own table, that’s the beauty of OpenType.

To have a look around the tables of compiled font files, type designers use a Python library called FontTools, started by Just van Rossum in 1999, now maintained by Behdad Esfahbod of Google.

Custom tables are what make emoji in font files possible. The trouble is, not everyone can agree on one table for emoji, so there’s a few different tables that do similar things. There is the Google/FreeType CBDT and CBLC tables that insert a colorful PNG image in place of a scalable outline when a user types a glyph. There is Apple’s sbix table, which is similar in most respects to Google/FreeType’s tables, with a few differences. There is the Microsoft COLR and CPAL tables, which colorizes the traditional scalable vector outlines with color palettes. Finally, there is the Adobe/Firefox SVG table, which allows the vendor to include an entire SVG in each glyph.

Each one has their own advantages and disadvantages, with no clear winner. In the next part we will talk about each table, their exciting possibilities (responsive emoji! Animation!) and get into emoji ligatures.


Thanks for reading! If you liked it, check out my other articles on emoji and click the “recommend” button 👇