The Anatomy of a Thousand Typefaces

An attempt to build a font database with opentype.js

Even years after Avatar’s release, there’s one thing Ryan Gosling just can’t get over: the choice of the movie’s logo font “Papyrus”. In the parody produced by Saturday Night Live, the designer of the logo opens the font menu, browses the fonts one by one, and randomly decides to go with “Papyrus”.

“Papyrus” by Saturday Night Live

Dinner for none: The font menu’s bitter taste

The average font menu presents a list of available fonts, sorted by name, but completely unrelated otherwise: A typeface designed for bold headlines is followed by one designed for small user interfaces and then a fancy script typeface made for wedding invitations shows up. Now you either get trapped in a time consuming process of scrolling through the whole list from start to end or you simply decide to pick the first best match from the upper part of the list and call it a day.

The font menu in “Papyrus” by Saturday Night Life. Limited choice, various styles, but not necessarily the best possible typefaces.

A systematic approach to finding typefaces

There are various ways to limit the options. Before I dive into parsing font files, glyphs and metadata tables, let’s first have a look at classification, curated lists and then anatomy.

1. Classification

Early in design school I’ve learned about the history of writing and practiced calligraphy to understand how writing evolved and how tools had an immediate impact on the design of type faces.

Classification filter interfaces. Top left: Fontshop. Right: MyFonts. Bottom left: Google Fonts. Bottom: TypeKit.

2. Curated Lists

Another way to put order into the chaos is to rely on the knowledge of others: Human-curated font lists. We can find those for example on Fontshop. There you can find collections based around a decade in history such as “1930”, based on similarity such as “Helvetica Alternatives” or application specific lists like “Branding” or “Newspapers”.

3. Anatomy

The most complex way to look at typefaces is to focus on their design details and to try to understand what makes a typeface good or special. Fortunately, there are books on type design, typefaces and typography. They can teach us how to make typefaces, how to choose them and how to use them.

“The Anatomy of Type“ — A Graphic Guide to 100 Typefaces by Stephen Coles. A great book if you want to learn about the history and design details of popular typefaces.

Inside a font file: Lack of meta data

Before I started coding, I was hoping that I could find out about the properties of a font in an easy way. In theory, every font file comes with a variety of metadata tables that contain information about name, author, language and visual characteristics of the typeface. Width, weight and font family class are the obvious ones. But also information about xheight, cap height, average char width, ascenders and descenders could be found. Another set of metadata called Panose describes even more properties such as serif style, proportion, contrast and many more. Using font design apps such as Glyphs, anyone can inspect fonts to view this information:

A screenshot of the “Font Info Panel” of the font design app “Glyphs”. It shows basic information about family name, designer, url, version, date. User definied settings show the Unicode Range and Panose information. The 10-digit code describes many characteristics, but this information is not always available as it has to be defined and measured by the designer or producer of the font file. On the right screenshot one can see metrics such as ascender, descender, x-height, and italic angle.
Comparison of Panose information available for Roboto and Fira Sans, both available on Google Fonts. While Fira Sans provides a lot, Roboto doesn’t. This metadata can obviously not be used to compare fonts…

DIY: Parsing fonts with opentype.js

So in order to classify and compare typefaces myself, I had to take a close look at font files and find automatic ways to extract information. Fonts are available in a variety of file formats, but eventually they are almost always available as TTF (TrueType Fonts).

A database of characteristics

In the following section I will describe how I measured contrast, x-height, width and weight of all fonts provided in the Google Fonts Library. The same methods could be applied to other font libraries such as Typekit or fonts from your computer.

Stroke contrast

The contrast describes the ratio of thin to thick strokes. There are typefaces with little stroke contrast, e.g. slab serifs or many sans serif typefaces designed for user interfaces, e.g. Roboto or San Francisco. There are others with a lot of contrast, such as Bodoni or Didot. To measure the contrast, we can trace the outlines of an “o and look for the smallest and largest distances between inner and outer shape.

The contrast of a typeface may be measured at the thickest and thinnest part of an “o”.

x-height

The x-height is an important characteristic that can be an indicator for legibility and perceived size of a font. It is usually measured at the top of a lowercase x.

The x-height can be measured from the glyph information that opentype.js provides.

Width / Proportion

With this value I try to grasp how narrow or wide a font is. Is it rather condensed or extended? One idea I had was to measure the width of an “M”. But to make those comparable, one would need to put those into context of the overall size or the x-height. Some typefaces also might have very special “M” glyphs that don’t represent the rest of the typeface.

Weight

To measure the weight, I render the lowercase „o“ character to a HTML canvas element, fill it black and paint the background white. I then measure the ratio between black and white pixels. A script or hairline font will show very little values, while a very heavy blocky font will show high values. This gave me okayish results, but I want to improve this by measuring actual stems of glyphs in the future.

Spacing

When all glyphs of a typeface have the same width, they are called monospaced. Important here is, that we can’t necessarily look a the glyphs themselves to determine the width. Even in a monospaced font, a dot character takes less visible space than a “m”. Thus, we need to take into account the advanceWidth property that describes the invisible width around a glyph. Here we can find out that Google Fonts uses monospaced as a style classification, but not to indicate the technical property. Fonts such as Lekton or Libre Barcode are not listed as monospaced, but technically they are.

Similarity

Once we have a table of values, we can normalize those and then compute distances to see how similar fonts are. I implemented a very basic version of it that isn’t terrible, but could be better with higher accuracy of the data. Also, we might perceive similarity different than an algorithm that treats every characteristic equally. In that case, we might need to weigh some properties more than others.

The font parser analyses each font, draws hidden SVGs and canvas elements for measurements and saves the data to a JSON file.

Demo

I’ve built an interface to make the database accessible. Fonts can be viewed in a grid of varying size to get an overview over all fonts or to take a look at details of a few fonts.

Screenshot of the project’s website

Findings

The dataset invites one to explore and find similarities and irregularities. Setting low contrast and serif will return all the slab serif fonts. Low x-height will give us mostly handwritten or script fonts. Very high values often indicate all caps typefaces.

Summary

This is a complex way of looking at font exploration. Ultimately, the quality of results depends on the quality of the fonts and the data around them. Only browsing Google Fonts is very limiting as their overall quality isn’t known to be the best in class. I’ve already started to run analysis on the Typekit library and ultimately met new challenges of user interface performance when previewing so many fonts. Such an undertaking requires proper caching and preloading strategies. But I don’t have to go that far just yet.

Possibilities

With such a dataset, one could do more things:

  • automatically adjust font sizes and line heights based on x-height
  • find font combinations based on similarity or difference
  • build a custom font menu for Avatar’s poster designer

Further reading

Panose Classification Metrics Guide
The guide from 1991 describes in detail how to measure individual glyphs to derive comparable metrics. Unfortunately those measurements need to be taken by hand and can be quite time-consuming.

Appendix

Q: Why didn’t you use data from web font services?
A: Services that provide fonts such as Typekit, Google Fonts, Fontstand, Fontshop, MyFonts, etc. all provide their own set of filters with more or less fine control. The APIs of those services also differ in the amount of information that is available for each font. Usually the category is provided, but other information is either left out or not compatible between the services.

Pushing boundaries, not pixels. I make design tools, apps & bagels. Previously @ginettateam. Studied @idpotsdam. Former Intern @Behance, NYC.