Prepare to Go Global: a 7 Part Engineering Handbook

An engineer’s guide to tackle localization, internationalization and globalization, early on while in the design phase

David Perlman
Wix Engineering
11 min readMar 23, 2022

--

preparing to go global
(Image source: Alconost, translation and localization provider)

Any project that has succeeded in one place in the world has great potential to replicate that success in other locations. When the time comes to go global, you start by analizing the target market and researching the culture. But in the end it will fall to the engineers to deliver. I am here to argue that the amount of time and effort required to make that leap can be greatly reduced by making some good decisions when first designing a system.

As developers we always try to “keep it simple stupid”, get to market fast, and can’t be bothered writing code that “might be useful someday”. Hopefully, you will see that even though preparing for globalization is a category of its own, not so simple, takes time and requires thinking in advance, it pays off big time.

At Wix, a company that caters to the globe over, engineers deal with many of the ins and outs of the globalization effort. Over the past 6.5 years I myself have taken part in this effort while working on several related projects. What I will cover here are some of our lessons learned, and while not covering every possible thing needed to be fully globalized, I do hope this can help ease the pain that is usually associated with endeavors of this kind.

1. Translations

people speaking the word “hello” in many languages

Think Language Variant, not Language

Texts used by an application are usually stored as a set of files, one file for each language, for instance: en, fr, es for languages English, French and Spanish. While many languages have multiple variants: en-GB, en-US; products are usually created targeting a specific variant, even if this is not stated explicitly.

Prepare for supporting multiple variants by creating a fallback main variant for each language. When requesting translations always pass the full variant name and let the translation service handle falling back to the main variant. That way new variants can be seamlessly added to the system.

Interpolation

A very common pattern found in building user messages is the need to concatenate several pieces of text into one sentence allowing for some parts to be dynamic. The temptation is there to simply create a translation key for each part and join them to create the sentence. Alas, using that technique will undoubtedly cause issues when translating the text to languages with different grammar rules.

Related to this guideline is one other concern: creating a unique entry for every text in the system and not reusing entries just for the sake of it. With translations, context is everything, so that a word used in several contexts in one language could be different words in another.

Follow these two guidelines to keep all translations easily manageable and speed up the process of adding a new language to the system.

Pluralization

Texts that contain a mix of words and numbers often have multiple forms for different number values. For example what happens when we change the value in this sentence:

“There are 10 people ahead of you on line”

If the number goes down to one:

“There is 1 person ahead of you on line”

And when it's your turn:

“There are 0 people ahead of you on line”

Or even:

“There are no people ahead of you on line”

Changing the number can change the sentence in several places. It might be tempting to try to solve this in an imperative manner, say using a switch/cast statement. The issue is there are languages that have more than 3 plural forms, some have up to 6.

Prepare for supporting locale based pluralization rules by integrating a library for this into your system. I suggest looking at the ICU libraries out there and selecting the one that best fits your platform. The translation entry could look something like this:

Punctuation

All forms of punctuation differ between languages and variants. This includes quotation marks, commas, periods, parentheses and more. Keeping all punctuation within the translated text is good preparation for adding support for more languages.

Capitalization

The same applies to text in all caps. What works in one language or variant might not work in another, so best to refrain from transforming capitalization using code.

Direction & Length

User interfaces should always take into consideration that the text used within elements will most probably be of different lengths in different languages and would change direction in RTL languages.

This affects not only the visual design of the interface but also the technical design. User interface components should be able to deal with texts of different lengths and be aware of the direction of their rendered content.

There are algorithms for more advanced rendering of vertical CJK texts and mixed bidirectional texts. Keep this in mind when choosing components for text input and display.

Unicode

Storing all content encoded in Unicode will ensure nothing is lost when content is in any alphabet or is even emojis 😃.

2. Datetimes

people communicating dates across time zones

Formatting

All major frameworks, and now all web browsers (Intl), come with locale based datetime formatters built in. With these formatters, and sticking with the predefined formats (‘full’, ‘long’, ‘short’, etc.) you can be assured of correct rendering, no matter the language or locale.

On the other hand, creating a custom format, as is sometimes deemed necessary, can become very tricky. Especially when handling the placement order of the day and the month and when rendering the time in a 24-hours vs. an am/pm system. Leaving custom date formatters out of your codebase is good preparation for switching locales.

Time Zones

The question of how to store datetimes comes up very often. The simplest form, and often the choice, is as a Unix epoch timestamp. This form is compact, easily identifiable and easily parsable.

The problem is that it lacks information about the time zone in which the value was created. This will become an issue when users from different time zones are communicating things like due dates and expiration dates. This will also become an issue in countries where it is customary to display the time of an event with the time zone of the location of that event.

A good way to prepare for cross time zone user communication would be to store the relevant time zone next to or together with the date value itself.

3. Numbers

people converting numbers between locales

Formatting

Frameworks and web browsers come with locale based number formatters right out of the box, just as we saw with the datetimes. Setting the locale for these formatters and always using a formatter when displaying numbers is good preparation for when locales need to be switched.

Currency

Issues do come up, however, when formatting numbers in currency format. We have found the built-in formatters are not always aligned with the real world usages. When dealing with money this can become a real issue, the user interface can feel unprofessional and with some payment providers transaction could fail.

To prepare for setting overrides based on real world user feedback use a wrapper around the built-in formatter. All logic that makes any adjustments to the formatted value should be placed in the shared utility.

In this example we wrap the web browser `Intl` currency formatter to make some adjustments. Specifically to remove the minor unit from two currencies. Let’s say that contrary to the built-in formatters, in the real world the New Taiwan Dollar (TWD) and the Hungarian Forint (HUF) don’t use a minor unit. (this is actually what we have found). This would be the fix:

4. Postal Addresses

a person trying to understand an address on an envelope

Capture Form

The task of creating an address capture form seems simple at first. The reason being that we think of addresses in the form to which we have become accustomed. However, when looking at how addresses are modeled in different countries we see subtle differences that translate into a not-so-small engineering challenge.

These are four different ways address capture forms can differ from country to country:

  1. The list of fields
  2. Field requirement
  3. Field display names
  4. Field position

The Fields

Officially the full list of possible address fields is quite extensive. In practice a list of about 10 fields should be enough to capture a usable address in most countries. These are the fields:

the list of fields commonly found on address forms

Required Fields

It is not well known which fields are required in which country to ensure address validity. While some fields are unquestionably mandatory (city, country), others are questionably so (state, postal code). To require a field in a country where that field isn’t used causes user frustration. Making all fields optional will capture a high percentage of invalid addresses.

To better understand how this can affect user experience let’s consider the postal code input field on the address section of the checkout form on an ecommerce website. Setting that field to be mandatory in countries where postal codes are optional or non-existent would most likely lower the success rate of that checkout flow and hurt the business of that website.

Field Display Names

Address fields with the same meaning can have different names in the same language but in a different country. For instance the field name for subdivision in the US is simply “State”, whereas in Australia it should be “State/Territory”. This is similar to the idea of language variants discussed above but can be tackled separately before full support for language variants is introduced to a system.

Changing the name of a field can change the grammar and punctuation of a user message (e.g. error messages, placeholder texts). When thinking about localizing address field display names, all related texts need to be taken into consideration.

Field Positions

The best way to present an address form is with the fields in the order that they would appear on an envelope. One place to find the address layout for each country is on the Address Solutions page of the UPU (the Universal Postal Union of the UN).

To prepare for adding more variations of the address capture form, the attributes of each field should be stored separately for each country. A set of default field attributes could be used to reduce the number of records needed while allowing overrides for each country.

Address Formatting for Display

As mentioned above, the Address Solutions page of the UPU provides the formatting system of every UN member state. Implementing a shared utility to format addresses based on those rules will keep displayed addresses consistent even as support for more countries and locales is added to the system.

5. World Data is Dynamic

I wrote above about storing information on each country while ignoring the fact that this information is not static and is always changing. Applying these changes can be painful when references to that data are stored throughout the system.

To prepare for such change every field that is referenceable should contain metadata about its state. Adding an attribute about the depreciation of some item and existence of some replacement will allow versioning and support backward compatibility.

Here is a real world example. In 2014, France reduced the number of its metropolitan regions from 22 to 13 (effective in January of 2016). If those changes by the government were simply applied to the data stored in the system it would corrupt any data referencing the regions that were removed. With some region metadata the transition is smooth and accomplishable in the afforded timeframe (2 years).

6. Cultural Subtleties

Personal names

Many systems store personal names in parts (first, middle, last, prefix, suffix) and not as full names. The former is good for sorting and searching, while the latter allows users to enter their names in the way that is culturally appropriate to them.

Creating a central place where personal names are formatted for display given a locale is good preparation for entering new cultures without offending users by displaying their name in an inappropriate fashion.

Visuals

Visuals, i.e., icons, images and videos, are by far the most powerful way to convey a message to users. At the same time a message that is strong in one culture could be less powerful in another culture, or much worse it could be very offensive.

Enabling the system to load visuals by locale (with a fallback system similar to the one described in the section about translations) would be a good preparation for the system as it enters a new culture.

7. Legal Concerns

legal in texts, in court and and online

How to make a system compliant with the laws in different countries and jurisdictions, is a broad subject. What I’m describing here are steps to take to be better prepared in two areas, namely Privacy & Consent and Personally Identifiable Information (PII).

Privacy & Consent

A consent policy object could look something like this:

Storing a consent for each user with an expiration date is a good start for preparing for privacy regulations.

Personally Identifiable Information (PII)

Every field of data being stored for the long term and containing PII should be annotated with a flag. Different regulations will require the system to encrypt this information and retrieve or delete it per user request.

Keeping tabs on where this data is stored in the system from the get go will help meet regulations when the legal requirements reach the engineers’ desk.

“Plan for what it is difficult while it is easy, do what is great while it is small.”
Sun Tzu, The Art of War

Here’s wishing you nothing but success and smooth sailing going global with your projects! Having prepared or not :)

--

--