Handling Emojis (or other Unicode characters)? Beware of Docker images!

We are developing world-class Messaging support for Customer Service. Thus, it follows that we MUST support emojis since consumers, who are interacting with our clients, will often express their emotions using these handy icons. As we embarked on this endeavor we were excited to see that although emojis don’t have an “official standard” they do have an official block in the unicode+ range and they have many “defacto standards”. Here is a little info about them:

Emojis 101

  1. They start in the U+1F60x Unicode block
  2. They can be rendered by all browsers — although the computer needs a font that supports them.
  3. You MUST support UTF-8 in your Microservice, your pipes (MessageQueues), your DataStores (mysql, nosql, etc), and really anywhere else the message might flow through. This article by Twilio does a nice job explaining what to look for!

Great so we followed all of this and things are working great locally and in our VM test environment. Next we check in the code and it works its way through our continuous deployment pipeline. The service is out in our internal “production” environment and I go to use it… guess what instead of seeing 😀 I see: ?

… ugg

The Problem!

When running locally and when running in our VM test environment we are using our Mac or Linux OS’s. However, once our microservice gets deployed we are hosting it in a Docker Image whose base image is Ubuntu Trusty. This sent me down a wild goose chase checking all the different things that could possibly be not handling UTF-8 correctly. In the end a colleague of mine pointed me into looking into the LANG environment variable which nudged me towards the bigger problem…

Ubuntu Trusty Base Image and Locale

It appears that by default the Ubuntu Trusty base docker image has no Locales generated. Because of this, it really doesn’t matter what you set the environment variable to. If your not familiar with this stuff the Locale page has more details.To solve this in your Dockerfile you can do something as simple as this:

RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

This is one of those times where I love working alongside other great Software Engineers — it truly makes all the difference when trying to solve confusing and/or complex problems. Hope others find this helpful if you are running into similar situations. Enjoy!