Localization sounds simple, but it is not: Part 1 — Intro & Translation
Full guide for localizing your application covering Translation, Currency, Global vs. Local content, UI adjustments and R&D culture.
Every business wishes to have a worldwide appearance and influence. The process of supporting different languages, currencies and serving the right content is called Localization.
In theory, this process may sounds simple. How hard can it be to translate some text and replace the $ sign with something else? In reality, is hard. It is a very complex process with many difficulties, lots of alternatives with different pros and cons as well as and plenty unknown-unknown aspects.
In most cases, applications are built to support only a single locale, language, currency, content and appearance. That makes the localization process much harder since major refactoring efforts are almost inevitable. It even becomes more complexity when the original product is still being developed while you try to localize it. It feels like changing wheels on a driving car.
Recently I had the experience to localize an American web application. I survived to tell the story and I want to share it with you in a series of 5 posts which are listed as follows:
Part 1: Translation (This is where we are)
Part 3: Serving Global vs. Local content (Not published yet)
Part 4: Handling UI adjustments (Not published yet)
Part 5: Creating localization aware R&D culture (Not published yet)
What does localization actually mean?
So what does localization mean? It means that your application could be used in different countries, languages and cultures around the world. It doesn’t have to support ALL the languages and ALL the cultures, but it should support more than just English and USD.
So what do we need to do to make our American application localized for let’s say Israel?
- It should be translated to native language (Hebrew).
- It should display the local currency sign (NIS — ₪) instead of USD ($).
Is that all?
The answer is ‘not exactly’. There are many more aspects that must be taken into account like separating the content of one locale from the others, changing UI according to different locales, monitoring, testing and logging activity per locale, and much more.
In this series we’ll cover all the aspects, starting with translation which by itself is a very complex process that has different ways of implementation.
Translation is just about replacing strings from one language to a different language, however that is not that simple and there are different ways and approaches to achieve proper translation.
Before we jump into the details, it is important to define right from that start which content you wish to translate and which one you don’t. For example, you probably don’t need to translate users or items names, description and UGC (user generated content like reviews or statues). You do want to translate the application layout which includes buttons, menus, and titles, etc.
Machine translation vs. Human translation
There are two ways to translate your content. One way is to use machine translation, a translation service, like google translate. To achieve this, you simply need to send it your texts and get back its translated version. It might sound perfect but if you’ve ever used google translate you probably know that its translation is far from perfect. It doesn’t handle sentences well, and it doesn’t have context for different words (right vs. right). Despite the quality of the translation, there are some huge and famous applications which use this approach. Ali Express , for instance, is one of them. It supports all the languages by using machine translation. See some “qualitative” examples below:
Funny enough, this application is considered to be the largest marketplace in the world. So in some cases, this approach might work.
On the other hand, human translation is done by, human beings. It can be in house content managers or a 3rd party agency that specializes in translation. The disadvantage of a 3rd party agency is that they also don’t fully understand the context of the words or sentences of the app. However, I’ve found out, that in-house translation is the best solution in terms of money, time and quality of work.
Once you know what content you wish to translate and you’ve defined the way you are going to translate it, it is time to think about the actual mechanism that will perform the translation. How will the translated content be delivered to your users?
This is the most common way. Many frameworks have server-side translation built into their API. This method uses resource files which are key-value dictionaries that hold the key of the target to translate and its translation. Usually, there will be a resource file per language.
When rendering a page, the system finds resource keys in the code and replaces them with the matching resource values; hence the page is rendered with the translated content. This is a big advantage since there are no latency issues while presenting the page.
On the other hand, this approach forces you to create keys for every piece of text that you write. Sometimes the keys are readable, and sometimes they look something like this: btn_enter_store_closed_error which makes it difficult to read through the code and understand it.
Another disadvantage of this mechanism is that one time when the translation mechanism fails for some reason and you are left only with your keys as a fallback. Do you recognize the site?
Another option for server-side translation is using Gettext is an internationalization and localization (i18n) mechanism that was originaly created for supporting multi-language programs on Linux. Unlike the key based option, using Gettext allows you to write your text in plain English and the phrase serves as a key to the alternative translations.
printf(gettext("My name is %s.\n"), my_name);
The translations are kept in special files PO files which you need to maintain and load with your system in order for the translation to work. The advantage is that you no longer need to create wired keys inside your code. The disadvantage, however, is that you need to remember to add every new phrase to these files and make sure that it has a translation in all your supported languages. Nowadays, PO file became standard, hence there are applications that help you in managing and operating these files.
The second approach is to render the page in English and let the client change the content to the correct language. The big difference here is that you no longer need key-value dictionaries. You write plain English, and in case of error your fallback is English. The translation itself happens using a 3rd party tool/layer that captures parts of the content and replaces them with the translated version.
There are two ways to achieve this. You’ll need to mark the content that you wish to translate for it to capture and replace it or you will mark what you don’t want to be translated. The latter is more preferable, since it automatically detects every phrase you have in your app besides the ones you’ve explicitly decided not to translate.
The translated content is loaded to proxy servers (CDN) and the server response is being processed by the server before it gets back to the client.
The disadvantage comes when you wish to test the translated version while developing it. Proxy translations are usually integrated only to production due to integration difficulties. That makes it hard to catch UI glitches or even perform some acceptance tests on the specific language.
Pure client side translation
The last approach is to store the translations on the client itself and switch the content during page load. This approach usually persists the translations as a key value dictionary when the key is an English phrase (a word or a sentence) and the value is the translated content of the desired language. The user will fetch the entire translation (or parts of it) upon his first page load and then it will perform the switches in the client. The data is usually persisted in the locale storage.
We’ve found that this approach is the best for us for several reasons:
- Simple management : A translation framework like localize.js automatically detects all the existing strings in the website, publishing them to a dedicated dashboard and allowing you to decide whether to translate the phrase or not (in case of UGS i.e.).
- Immediate effect : After you save the translation all the clients that will view this phrase will be updated with the new content. This way you can fix issues immediately without contacting support of some 3rd party proxy app or redeploying your code.
- Works on local machines : Since the translation is persisted on the clients, it allows to translate also text during development.
There are two places where translating text becomes extremely hard: CSS and JS. Sometimes we add text using before/after CSS selectors or simply in JS code (error messages i.e.). No framework is smart enough to detect such texts, usually we end up in refactoring the code so that the text will be delivered from the html. I strongly recommend to avoid putting strings inside client side code from the beginning.
As you can see, there is more than one way to translate your site. It is important to know them and also take into account their pros and cons. In my personal opinion, the process must be simple to both the one who performs the translation and for the developers to both implement and test the translation on their local machines.
Now let’s move to the next chapter (Currency). As you can guess, it is not just replacing the $ to a different sign.
Localization sounds simple, but it is not:
Part 1: Translation (We are here)
Part 3: Serving Global vs. Local content
Part 4: Handling UI adjustments
Part 5: Creating localization aware R&D culture
I want to thank Omri Fima for reviewing this post.