Going global the right way

Developing WeGift’s App for an international market

Will Pimblett
Engineering at Runa
9 min readJul 5, 2021

--

If you’re an engineer looking to join a mission-driven company with an amazing engineering culture — check out our careers page for openings.

Our product delivers value across a range of brands across multiple regions worldwide; to do this we have to adapt to work in those regions. When it came to adding the first foreign products I was tasked with finding a scalable solution for taking the app we had already built and adding support for translation.

The Tower of Babel (Vienna), Pieter Brueghel the Elder, Public domain, via Wikimedia Commons

This journey started in mid 2017 and we now have more than 3 times the number of non-UK brands on offer than UK based ones. It’s not always been smooth sailing and we’ve learnt a lot along the way

I started writing this article thinking this would make a good topic for a single blog post. It turns out there is a lot more to say on this than is practical for me to write and you to read in a single sitting, so instead this is our first step into this process. While our backend is Python, specifically Flask, this journey should apply to any language or stack. While I mention cultural issues, such as different colours or symbols having different meanings, my experience in design, while enough to know these are potential pitfalls, precludes me from giving you advice on how to deal with these issues at scale.

What we tried to solve for

You can cut our application into three main areas: what our buyers use, what our brands use and what our consumers / end-users use.

The first and luckily simplest is the consumer-facing parts. This includes redemption pages for gift cards and other views dealing with actually using the value we deliver. We planned to integrate and distribute gift cards across Europe, North America and Australasia. We needed to translate and adapt these views to provide the same great experience that our customers in the UK were used to.

Beyond simply translating the words we will also need to ensure dates, numbers and currencies all follow local standards. The redemption pages shows expiry dates for example; the difference between 12/01/2021 in the UK and the US could cause some serious confusion for the user!

A little primer

Now that we know roughly what we’re up against, how do we go about solving this? At a high level this is a two step process: one of internationalisation and one of localisation. Before we get too deep into this, let’s get a definition for these terms, courtesy of the W3C.

  • Internationalisation [or i18n] is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language.
  • Localisation [or l10n] refers to the adaptation of a product, application or document content to meet the language, cultural and other requirements of a specific target market (a locale).

Big words, that’s why we like to shorten them! Internationalisation, or i18n, is what we want to do first. It’s the system that allows us to do localisation, or l10n. When you’re working on a small project with only a handful of pages you might get away with swapping out your English HTML template with a French one, but once you’re dealing with a bit more than that you need a system to handle this. So our first step is internationalisation, or i18n, developing the framework the product will use to enable us to localise.

Naïve approaches

Perhaps a little bit of a loaded title but we discounted, either in theory or from practice, methods that’ll perhaps get us to a working solution quicker but won’t be maintainable long term:

  • Creating duplicate templates. If you have a dashboard.htmlwhy not just split it into dashboard.en.html and dashboard.fr.html. While this is simple and gives you a lot of flexibility with adapting the UI for each market you’re splitting the app along language lines. Adding a new feature or even just updating some text becomes multiplied by each language you want to support. No thank you! Note: this can be a valid solution if splitting along the lines of language does make sense, such as a French view which is so specialised it would not make sense to use a more generic solution.
  • Rolling our own string replacement. Take a sentence from the dashboard “Welcome to the Website” and put that in a constant WELCOME_TEXT, then just have a set of constants for each language. This very quickly becomes complicated as soon as you need to deal with inserting variables into the text, dealing with plural forms, and managing translations with non-developers.

I won’t attempt to list every other way you could half do i18n. We haven’t even gotten into date and number formatting. Suffice to say this is a hard, but not insurmountable, problem. We’ll start with how we decided to manage text and then get to the actual problems we encountered along the way.

Bringing in some tooling

There is no silver bullet that’ll do i18n for you, but there is tooling to make the job a lot more manageable. As mentioned for most of our services we use Python and the Flask ‘micro’ framework along with front-end JS components. I’ll be talking about how we did this within our stack, helpfully however the leading requirement we had was to keep the solution portable. What path we went down needed to be able to adapt to any platform changes we made going forward, so hopefully the journey presented here should be of use no matter what stack you’re using.

Unlike a more “batteries included” framework Flask doesn’t have a solution for i18n. It does however have an ecosystem of third-party extensions. The most popular is Flask-Babel which unsurprisingly is a thin wrapper around an underlying library Babel. Side note: only later will the name Babel cause confusion, being shared with a popular JavaScript transpiler.

Babel, the Python one, itself provides gettext functionality. Message catalogues (we’ll get to that) are saved in the ‘gettext’ format. Gettext is old, first released over thirty years ago, but still very much wildly used. This is a winning combination, we have a stack of tools based on a very mature format, if we need to change or replace a large amount of this stack there will be options. Our translation data won’t be stuck in some specific format only used by one tool.

In summary we have tooling that will help us with: locale data, dates and times, numbers, and translating text. Great!

Message strings

A message string is a block of text you want to localise. It is marked up in some way, a translation is sourced, and then wherever the string is present in the app it gets replaced with the translated version. Let’s say we’ve got an email:

Hello,Thank you for your purchase, here is your reward!Thanks,WeGift

First we need to decide how to cut this up. You could have this whole email as a single string, but that means the greeting and signature would have to be translated again for the next email, possibly differently. You could also split the body sentence into two parts, but the peculiarities of language will cause you issues (more on that later).

We split into three ‘message strings’: the greeting, the one sentence in the body and the signature. These are nice clean blocks that promote re-use (header and signature) and encapsulate enough meaning to be reliably translated.

An i18n tool behaves much like a hash map or a dictionary. It needs a key to store translated strings against. The two major options are:

1. Create an unfriendly key for each string we want to replace.

{{ EMAIL_GREETING }}{{ EMAIL_BODY_PURCHASE_THANKS_REWARD }}{{ EMAIL_SIGNATURE }}

2. Use the English language text as the key calling a ‘gettext’ function.

{{ _("Hello,") }}{{ _("Thank you for your purchase, here is your reward!") }}{{ _("Thanks,\nWeGift") }}

We went with the latter option as:

  • The source text stays in the source file, making searching the code easier.
  • The English is pre-defined, the template works without the translation system fully set up.
  • If a translation is missing we can fall back to English, not a good user experience but still better than nothing.
  • If the source changes the key changes, mostly a good thing. It’s obvious when we need to revalidate translations.
  • (As long as context stays the same) re-use of strings is transparent. Anywhere else that has “Hello,” will re-use the same translation. There are ways to mark a string as having a particular context but we found this was not very common.

Plurals, can’t be that hard!

From an English perspective the thought of plurals causing problems seems a little silly. When you have more than one you just put an s on the end of a word like: “I have 1 apple” and “I have 2 apples”. This can be written as:

f"I have {n} apple" + ("s" if n != 0 else "")

And indeed for English it usually is that simple, but as with nearly everything with i18n this falls apart very quickly.

I started this project with an existing codebase and, while not counting apples, we had several lines using a similar method to the above. From the English mindset it’s very easy to write the above with little thought, and it’ll serve you well until you need to do i18n!

In English you write:

  • I have 0 apples
  • I have 1 apple
  • I have 2 apples

We can summarise this as singular case “1 apple” and a plural case “2 apples”. The rule for which to use is plural when n ≠ 1. For French however I’d write:

  • J’ai 0 pomme
  • J’ai 1 pomme
  • J’ai 2 pommes

French uses the singular case for both 1 and 0. They say “I have 0 apple”. So for French our rule is plural when n > 1. Ok, not too bad then? Well, depending on how much you know about language either hold on to your hat or roll your eyes at this rather simplistic explanation. There are languages with more than a singular and plural case. We number these where for English singular is 0 and plural is 1.

  • Polish has three cases, the rule is n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
  • Slovenian has four: n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3
  • Welsh has five: n==1 ? 1 : n==2 ? 2 : n==3 ? 3 : n==6 ? 4 : 0

These rules and the systems for managing is pluralisation, or p11n if you’d like to keep going with the numeronyms. Thankfully our tooling deals with this. Babel/gettext includes the CLDR Rules from Unicode, a database of all these rules in a programmatic form. To use we write any string that requires pluralisation using a variant of the gettext function:

ngettext(
"I have %(num)d apple",
"I have %(num)d apples",
num=n
)

We provide a singular and plural form for our base language along with the number of items and gettext selects which form to use. When translating you’d provide two variants for French, three for Polish, four for Slovenian, and five for Welsh. Depending on the language and the sentence some of these cases may actually be the same but given they can be different the option is available.

Note: We’re doing string interpolation using the percent character. In Python that does look a bit old hat versus newer methods like .format() and f-strings. While we have switched to using f-strings across our codebase they don’t work well here. Babel has to understand and reason with the variables being used and sadly does not or cannot support these newer methods.

Photo by Jakob Braun on Unsplash

So how do we actually get these translated?

We’ve got some strings marked up and are ready to get a translator to translate them for us. The detail of this is a little specific to the tools we’re using so I’m going to keep this somewhat high level. Babel has an extractor that looks through our Python and HTML files for calls of the gettext functions (such as _() and ngettext() that we saw above) and creates a template file, or message catalogue, of all of these strings. It then uses this file to generate a specific file per language for the translations to go into. The empty generated file for French will look something like this:

msgid "Hello,"
msgstr ""
msgid "I have %(num)d apple"
msgid_plural "I have %(num)d apples"
msgstr[0] ""
msgstr[1] ""

We send this off to our translator and they get back to us with this:

msgid "Hello,"
msgstr "Bonjour,"
msgid "I have %(num)d apple"
msgid_plural "I have %(num)d apples"
msgstr[0] "J'ai %(num)d pomme"
msgstr[1] "J'ai %(num)d pommes"

We plug that in, run a compile command (babel uses a binary representation of this file for efficiency) and just like that our email and message about apples is in French. Phew!

What’s next?

That was–hopefully–fairly straightforward, you can imagine however dealing with thousands of strings across tens of languages all in plain text with multiple translators may get a bit overwhelming. We hope to follow this post up and discuss how to manage your translations and the challenges applying ‘message string’ methodology to a real product. Perhaps we’ll have some time to talk dates, times, and numbers; which thankfully is a little easier than words!

--

--