Application localisation: helping translation and development get along
Hello! I’m Alexey Timin, a Senior Software Engineer at Bumble, in charge of localisation system development. Let me tell you more about our localisation system.
We work on two products: Badoo and Bumble and operate more than 150,000 phrases and texts, translated into more than 50 languages. Each project has its own users, its own market, its own style of communicating with users, and different versions for web and mobile platforms.
In this article I am going to describe our localisation, quality control and translations release processes, and, most importantly, how we achieved positive feedback on our translation system from our developers.
We have 300 developers working on projects. We have successfully managed to segregate responsibilities of translators and developers such that both groups can work independently and in parallel.
To start with, let’s have a look at what the localisation process looks like in our company.
In this diagram all minor details are not shown — you don’t need them to get an overall understanding.
We start with a Product Requirements Document (PRD). Then the client and server-side development team start working on the feature. And, in parallel, the translation process is being performed.
The PRD and the final release stage are highlighted in the same colour; this means the result needs to match the PRD. If the PRD lacks some details, then it won’t be clear to developers on who is responsible for what. Is it the mobile developers who have to integrate a text into the client part? Or the server developers who need to return it from the server in response to a query?
Let’s get to the bottom of all this. Before we go any further, I would like to introduce and explain one particular term to you, a ‘lexeme’.
A lexeme is any unit of text which needs to be translated. This might be a text on a button, a header or an entire paragraph.
Now, we can move on to the main part.
The first stage in our process is to come up with the right PRD. Within it, the element most relevant to localisation is the lexeme table. Essentially, this is a list of texts to be used in an application or on a website.
The table of lexemes specifies whether a text is returned by a server or integrated into the application. It is imperative that a key is given: if a text has been used before, then the key will be contained in the table; if, however, the text has not been used anywhere before, then a sequential number for the text will be given, and the developer will be able to set a suitable key.
Reusing text is a very dodgy issue. On the one hand, it speeds up the localisation process, but on the other, you might find yourself in an odd situation.
Using an example will help me explain why. On one occasion, we had a question in our app (in English), “Do you smoke?” to which the possible answers were “Yes” and “No”. Here we can see three lexemes: two for the answers and one for the question. The question was translated into Russian as, “Do you smoke?” and the possible answers were “I smoke” and “I don’t smoke”. Then we decided to carry out another survey and to reuse the possible answers from the previous question. In English it all looked right: “Fancy going to a party?” — “Yes”/“No”. In Russian, because we were reusing lexemes, the result was the following ‘exchange’: “Fancy going to a party?” — “I smoke”/”I don’t smoke”!
So, now, when we put together a PRD and are deciding on whether to reuse the text, we take account of the context in which it has been used before. We also specify whether a lexeme is returned by the server or integrated into the client and delivered to the clients via the App Store or Google Play.
These techniques save time because they obviate the need for discussion at later stages.
The next stage is translation. The main thing here is not to lose the meaning that was originally intended. And this can often happen because all languages are different — with myriad shades of meaning and their culturally-dependent expressions. And sometimes the most accurate translation is simply too long to fit on the screen and so translators need to find a compromise.
Let me detail for you what we start with, how we communicate context to our translators, maintain a common style and then check the final result.
Firstly, we select a language which is going to be understood by most translators. This is the language we will use for preparing the source texts so that they can then be translated easily into the other languages. All the languages we translate into (and we have 52 of them), are divided into main (Parent) languages and ‘dialects’. The language we use to prepare texts is English (we call it the Master language). Then from English, we translate into the other languages: Spanish, French, Russian and others. Sometimes a translation needs to be corrected for one of the dialects — such a case might include translating into ‘Mexican Spanish’ or ‘Australian English’. However, if this is not necessary, we will use the translation into the parent language: main Spanish or main English.
An example. Let’s say we need to make a greeting less formal. Initially, in English we had “Hey”, in Spanish “Hola”, in French “Salut”, in Russian “Privet”, in Australian “G’day mate” and in Mexican “Que onda” (“What’s the wave?” — Mexicans are cool!). Making text more original involves changing the English source text, at which point the translations into other languages become incorrect and they have to be checked and tweaked. We always alert our translators to this issue.
The impact of context
An important point is the context of a translation.
Let me explain using some examples.
Just so you know, I would like to point out that some of the examples are screenshots of popular websites and applications, but we don’t need to know their names; we are simply considering the most common types of mistakes that occur when it comes to localisation.
This is a sign at a petrol station. The English translation says “gun” (actually, correct translation is nozzle). But “gun” for an English and American person means a weapon. In this context, the phrase “Remove the gun from your fuel tank filler neck,” sounds a bit strange.
In the next example, the creators of an application decided to create a universal version of a text both for men and women — apparently, there was some advantage in doing so.
“Xотел(а)” is actually not a real word, it’s a combination of the male and female form of the verb “to want”. The user has to choose the correct variation by themself. You can compare the example with this inexistent word “(wo)men” — it looks weird. We try to avoid translations like this.
The next example shows how the original sense of the text can get lost in translation. Look at the Russian on the right: we are being offered the opportunity of chatting with ourselves. But actually, the original meaning was an offer to link to our Instagram account.
These sorts of mistakes occur when translations are carried out with no reference to context. That’s why we specify the following for each lexeme:
- description (what is the lexeme about, where it’s going to be used, etc.)
- an image which shows you elements that will appear next to the text on the screen
- a note as to whether the text will be shown to male or female users — so that translators can work out whether they require different translations, or not
- types of variable (this is a very important point — I will cover this in more detail when we come to look into the development process)
- the maximum length of the text: this is very important for push-notifications because of the limited screen width on mobile devices.
Also, we always need to divide large texts into parts. This is handy if you need to do a search or make changes later on.
Let’s look into this point in more detail. When we divided up a text, we lost the connection between particular phrases and sentences. That’s why it is imperative that we show translators what came before and what comes after this text. This is relevant, for example, in the case of legal documents — so they are translated correctly.
Also, translators need to be alerted to any regional terms or jargon words present in the lexemes. For example, take the sentence, “Unlock your Likes List to see everyone who’s interested at once”. The translator needs to know that `Likes`, here, refers to a special directory in the application which contains user contacts who have liked a profile. Another similar example would be the term “Stories”. Ten years ago, when someone heard the word “Stories” they wouldn’t have thought of Instagram. Nowadays, Instagram is the first thing people associate with that word.
So, a translation depends on context, and namely on the following elements:
- user gender
- singular and plural in the text: “You have only one friend” and “You have ten friends”
- platforms: Web, Android, iOS
- the project for which the translation is being done.
Sticking with the final point for a moment. Here, how lexemes are translated often depends on the project to which they belong. This is important because each project has its own distinct style.
For example, here are headers for letters sent to a user when their account has been blocked.
For Badoo: “Your account has been blocked.”
For Bumble: “You have been blocked.”
In order to retain a common style as part of each project, you need to give translators access to the translation history. We have a tool called Translation Memory (TM). The translator always has access to information on matching translations and the percentage of similarity: they can either use the old translation or enter a new one. We don’t only show translators texts which are 100% identical, but also ones that are less similar, and we always highlight the differences.
Besides allowing style to be maintained within a given project, Translation Memory also helps speed up the whole translation process because translators don’t need to enter the same text a second time.
Grammatical cases and numerals
We have another tool, called “Grammatical case matrix”. It is like the mathematical times table, only with case endings and plurals.
Translators populate this matrix for various words in each language. Whenever a new word is required, it gets added to the table.
The matrix helps avoid incorrect plural form use:
The Russian word подписчика means subscribers, but there are way more plural forms in Russian than in English, and the one used here is incorrect.
The advantage of this tool is that the form required doesn’t get chosen until immediately before it is displayed/shown to the user — in runtime. This is how it works:
For example, let’s say we have a translation into Russian.
“Credits” in the middle is an identifier, a link to the case matrix.
“Credits amount” on the right is a number which comes from the developer.
@3 designates a grammatical case (here it’s the accusative case), which has been specified by the translator.
So the entire phrase will be shown in Russian using the relevant grammatical case automatically. Awesome!
Multiply 150,000 phrases and texts by 52 languages and you get 7.5 million units of text. Of course, checking all the translations manually would be impractical so we set up automatic checking to take place at the moment translations are saved.
We automatically check for omitted emojis or variables. If a translator has accidentally removed a variable, the phrase in question loses its structure and sense. Compare: “You have 10 credits” and “You have credits” — in the second case the phrase has been corrupted and the sense has been lost.
We also check for missing HTML, otherwise the layout will go awry.
And we also warn a translator if their translation is longer than the original text. At that point, the translator needs to check it for accuracy and whether it fits the screen width.
Let’s highlight the main points of the translation process:
- translators need to understand the context
- the translation system needs to be sufficiently flexible so that a suitable translation can be made in every language and that a translator isn’t compelled to choose some universal wording. There has to be support around inflections and grammatical cases
- there has to be automatic checking.
Help from users
As well as the services of professional translators, we also get help from our users themselves. There are two methods in use here: A/B testing and shared translation.
Right, so you need a translation, for example, into Russian. The translator has translated a phrase in two different ways and you don’t know which one to go for. In this case, you can do an A/B test: show users various options and choose one based on their responses.
We had two options to choose from: “Are you ready to meet new people? Join us!” and, “Just a few more steps… and you will be part of Badoo.” As a result of testing, we established that more users completed registration when they saw the second version of the push notification, so that’s the one we kept.
Below you will find a full list of the elements a translation depends on. As you can see, the fifth element is the A/B test: a user ending up in any given group means they were shown the relevant version of the text.
Collaborative translation platform (CTP)
Once we sent out a notification to users in Mexico asking them to translate some texts into their language in return for a small reward in the form of credits, the application’s internal currency. And they agreed: in just two days they translated around 5,000 lexemes for us. This was a huge help, and we’re most grateful for the Mexicans’ input.
What is the benefit of this approach and why is it important? When you don’t have a translator into a local dialect, let users do the work. As it turned out, they were really pleased to take part in the development of a project they liked.
We have a collaborative translation platform (CTP). You can access it using your Badoo account and vote for the best translation.
This is a screenshot of a window inviting translation into German. Each user can add their version. As soon as one of the options reaches a threshold of votes, we show it to our in-house translator and they can use it as the main translation (on the condition that it complies with the style and rules for the project in question, isn’t offensive etc.).
Don’t be afraid to ask users for help. They will put you right and their assistance.
Let’s move now to the next interesting part: the development process. I deliberately opted to talk about the translation process first, and the problems it commonly throws up, in order to then show how developers tackle them
There are two main challenges here: how to organise development such that it runs in parallel; and how to keep a track of errors when using lexemes and so ensure that the correct translations are displayed at the right time.
Development in parallel
Let me begin with a story from when development was arranged differently at Badoo. Source texts used to be saved in a file in a Git-repository. Two developers were able to change things in parallel, but then these changes needed to be merged. It wasn’t a major problem, but it was inconvenient.
The old arrangement whereby we had to merge changes (different lexeme keys provided by the two developers)
Nowadays, we create and change lexemes centrally in the localisation system. Developers simply download a set of lexemes before they start working on a task. They write code, use the lexemes calling for them by their keys, and that’s it! They don’t need to think about anything else; translation-related questions are left to the translators.
Mistakes made in lexeme use
There can be multiple variables in translations.
For example, if you are in a hurry it is easy to confuse “credit_amount” and “credit”. In order to prevent such things from happening, we introduced a control mechanism, a so-called ‘text container’, to oversee the translation and identify the type of variables used in a particular translation. It performs substitutions and checks that values of expected types are sent for substitution. If all the substitutions are done, then the container returns a string only, which can be displayed to the user. If not, then the same sort of container is returned. If we attempt to display a translation before all the substitutions are done, then we get a warning in the logs and know where the problem is.
Main points regarding development:
- developers shouldn’t be having to think about localisation, changing text and such things.
- you need to check what developers do, and it is also better if this checking is automated — it spares the nerves of all involved in the process.
Right, so imagine we already have a release candidate and it has been translated. It’s time to test it!
Let’s start with some examples. How many mistakes can you find on this screenshot?
I can find two:
- the long translation clearly doesn’t match the screen size. In this case, almost everything is just truncated and the caption doesn’t fit into the button.
- not all the lexemes are translated into Russian
In the following example, besides the text being displayed in various languages, we are also being offered the option to “experience up difficulties”.
Because: “Узнать больше” (“More details”) has been truncated to “Узнать боль…”, it now means “Experience up difficulties” instead 🤣
Remember: quality control is essential 👆
Quality Control options
Let’s look into what quality control options are available.
The first thing which comes to mind is to check the translation on a test version of a website or application. That is to say, simply run it and see whether what comes out corresponds to the design, plan, technical brief and so on. Using this method we caught this mistake in a push notification. The message dedicated to a male user was sent to a test female user:
Another method of quality control is checking application screenshots.
We developed a special tool which takes screenshots in the test environment of all the mobile application screens in all languages. Anyone in the company can see what the screens look like, via the browser. This also has a special mode, showing identifiers for all the displayed texts. This is very helpful when it comes to debugging; you can see quickly what lexeme it is and why it ended up where it did (e.g. perhaps we have reused the program code which uses the lexeme).
Provided you have a web version and you just need to collect its screenshots featuring lexemes. You could integrate lexeme markers into the source code and write a plug-in for Google Chrome. The plug-in in QA engineers’ browsers could send screenshots of pages where it finds the lexemes, into the localisation system.
We have been using this method for quite some time. Within the first few weeks, it was already allowing us to collect a huge number of pictures. But we discontinued this because it only allowed us to obtain images of the version which had already been developed, while in the meantime we had learnt how to gather pictures when development was not finished yet.
Quality control during the translation process
As I said above, taking screenshots was inadequate for us once we already had an application ready. But we decided to stick to screenshots for the period when an application wasn’t yet ready when we didn’t yet have anything but there was still the need for some kind of quality control, to know whether things were progressing as they should.
So, this is how the tool for carrying out quality control during the course of translation came into being.
Let me explain the principle behind it. Our designers use Sketch, an application for creating interfaces, including ones for mobile applications. We have learnt to replace texts in Sketch files and, using the Sketch program interface, to generate screenshots of what we need on-screen. Now, as the translator is working on the text, we are able to show them screenshots in their language immediately. And we can do so even before developers start creating the first version of new functionality.
If there is no possibility of checking a translation in a given language, for example, in Japanese, we suggest ordering a selective audit, i.e. showing an external company the translation of every hundredth lexeme, along with a picture, for them to check the accuracy.
Main points regarding quality control
Visual assessment of translation quality is essential; during the testing process, it is important to know which devices your audience uses and to test the application on all such devices.
So, having tested functionality, now all that’s left to do is deliver it to users.
In our Badoo application, we had a service called “Superpower”. There came a time when we needed to change its name to “Badoo Premium”, and to do so immediately on all versions in one go so that the user would never see “Superpower” on one screen and “Badoo Premium” on another.
How? There is a branch of lexemes assigned to each branch of the task in Jira. When we incorporate any changes from any branch into a new version of the project, a new version of lexemes becomes available immediately. If we need to undo something, we simply remove a branch of the task from the new version and, along with it, a version of the lexemes with translations into all languages.
When a lexeme undergoes testing or when users can already see it, you need to be very careful; it is better to avoid making any changes to it and create a new version instead, assigning it to a ticket and, along with a new release, deploying a new version of the lexeme.
Nonetheless, translation mistakes inevitably still slip through. For example, here are two such mistakes below.
Wrong: “It’s a remath”.
Right: “It’s a rematch”.
In English, you shouldn’t use the straight apostrophe. Also, the letter “c” has been missed out.
Versioning lexemes and versioning translations are two different things. A translation can be corrected at any time: when a task is in development when it is at the testing stage or even when the functionality has already been delivered to users (there will be no harm done if the users see a correct translation in a new version of the application).
Deployment to different platforms
Updates are delivered to different platforms in different ways. When you develop a mobile application you probably have server and client parts.
What you show a user comes in either from the server or what they have on their smartphone (for example, integrated translation).
A translation passes from the server to the user via our production server, to which we can easily deliver updated versions of files with translations.
And an integrated translation has a long pathway; it passes via the App Store or Google Play. The user downloads an update and only having done so do they see the corrections. This process seemed too slow for us so we came up with our own updating mechanism, “Hot Update”. At the click of a button it allows us to generate a new version of translations and to let all the users around the world know there is something new to download and use.
When an application is run on a mobile device, it sends a notification to the server that it has just launched and communicates the current translation version. If the localisation system has an update ready, it responds by sending an appropriate notification. The smartphone downloads the update and applies it.
Release: main points
During the release process, it is imperative to take the application’s pathway from you to your users into consideration. Different parts of your application probably update differently.
Let’s return to the diagram I featured at the start of the article.
What you need to bear in mind when looking at implementing a translation system:
- write a detailed PRD
- take context into account and give translators access to it
- keep a history of translations in order to be able to maintain a common style within a given project
- automate quality control (otherwise, that translator, who might be several time zones away, might do everything their own way)
- free developers from having to make decisions about non-profile tasks. They are the ones who create new versions of your product, bringing joy to your users and giving you a feeling of satisfaction about the project you are creating.