Fuzzies, matches, new words — how to set up your translation memory correctly
Designs are done, content is finalised, engineers have worked on the internationalisation process and you’re ready to start localising in one or several languages…What’s next?
What’s translation memory?
Maybe you outsource localisation, maybe you do it internally — whatever you do, linguists are going to ask about reference material and translation memory (TM).
Translation memory is a database of previously translated content. It contains segments (units of text such as sentences or paragraphs) in your source (original) and target (translation) languages.
Translation memory divides segments into different match levels:
- 101% match
- 100% match
- 95–99% match
- 85–94% match
- 75–84% match
- 50–74% match
- 0–49% match
- Repetitions
These matches are grouped into four categories:
- Exact matches: 101% and 100% matches are called exact matches. The difference between a 101% and a 100% match is the context. A 101% match is also called a content match. If you have a 101% match, it means that your source segment is exactly the same as a previously translated segment and the segments before and after are also identical. 100% match is an exact match but the segments before and after aren’t identical.
- Fuzzies: 75–99% matches are also called fuzzies. They’re similar to the source but not identical. The degree of similarity is automatically calculated by our localisation software.
- New words: Anything under 74% match is a ‘new word’. The similarity between the source and target segments is so minimal (e.g. a 30% match) that linguists won’t be able to use any previously translated content. They will only be inspired or influenced by the translations.
- Repeated content: If your segment is repeated several times within a file/several files, the translator will only need to translate it once.
If you outsource your localisation to an agency, they should calculate the costs based on translation memory and repeated content. Exact matches and fuzzies should be slightly discounted. If you’re translating your product for the first time, you won’t have translation memory. If you outsource your translation, make sure to ask for the TM as a deliverable, too. It could come in handy if you ever change vendors.
How does a translation memory help?
Our content needs to be available on iOS, Android and web. Ideally, content won’t be different between platforms, but you might have to create different projects for your localisation software to get all the strings translated. iOS and Android strings tend to live in different projects within any localisation tool so you might end up translating the same content three times. With translation memory, you don’t have to. Save time and money by translating your web strings first and then auto-populate translations on Android and iOS strings (or the other way around). The only thing you might have to change are placeholder values.
With the right tools you can even automate the population of strings. Before sending new content strings to your translators, you can add a pre-processing step where you automatically populate segments based on your translation memory. Translators won’t have to look for similar content in the TM, segments will already be populated with the correct match.
We tend not to send 101% matches to translation. As they are context matches, it’s very unlikely the translations will be different. You can lock these segments so no changes can be made to the translation. Be mindful with locking segments though. If you have any doubt on the quality of your previously translated content, don’t lock any segments or you’ll be using a poor translation again and again.
Translation memory won’t just help automate your workflow, it will improve the quality of your localised content. Linguists are translating hundreds of words every day. If you’re working with freelance linguists, they are bound to be juggling different content, style and file formats. It’s an almost impossible task for linguists to remember every string they’ve ever translated. Having an up-to-date translation memory is an essential tool for them. They will be able to leverage the TM and be more consistent with terminology and sentence structure. For example, if a string is a 50% match, they can use it as inspiration to make sure we’re using the same terms throughout.
How did we set it up?
When I started streamlining the localisation process, I wondered why matches wouldn’t come up even when the English content was the same. I didn’t want to send the same content to my linguists several times, so I started copying and pasting translated strings from one project to another. It was annoying, frustrating and not scalable. I looked for a solution that wouldn’t involve engineers. When I looked at the locale codes and names, they were all inconsistent!
Our localisation software allows us to use different types of locales for one language mainly based on regional differences.
Different types of French
Different types of traditional Chinese
For some of our products, our content is divided into four projects on our localisation software (backend, frontend, iOS and Android) and the locales were set up differently for each project. I wanted to make it consistent so our TM would work.
Locale name inconsistencies within different projects
Before making any changes, I checked with our engineers to understand if they needed the locale code or the locale name in their code and found out they use the locale name to pull translated strings. I contacted our localisation software partner to understand if the TM was based on the locale code or the locale name, hoping it would be based on the locale code.
If it had been based on the locale name, we would have needed to update our code base. Luckily, I only needed to change the locale codes within our localisation tool. It wouldn’t impact our engineers at all. You can imagine how much harder it would have been to update every code base.
Consistency is the key to a working TM. The first thing is to set up your locales and be consistent. If the locale on your iOS project is set as Fr-Fr but the locale for Android is set as Fr, your TM won’t work. The software sees them as two different locales. Your target locales need to be the same across all projects. This is also valid for your source language. Check with your localisation software partner before changing anything.
We’ve streamlined our locale codes across all of our translation projects. Now we can use the auto-population feature to automate the localisation process. It’s a time saver and it gives more flexibility in the build.
One thing to keep in mind with this workflow is the quality of your TM. If there is an error in your translation memory and you auto-populate your strings without sending them to review, you’ll replicate the error in every other string.
If you’re unsure about the quality of your TM, you should always send it for review. The reviewer can improve/correct the translation. If your reviewer updates an iOS string and you know the same content is also used on Android and web, you’ll have to remember to update the other strings. That should be fairly easy and quick to do by searching your localisation software. Your TM should be automatically updated but double check with your localisation software.
Taking it a step further
I’m sure most of us have translated the words “ok”, “cancel”, “back” and “enter” quite a few times (and will again). How can we avoid that?
Depending on the software you’re using, you might have the option to extract the most repeated content in an automated way. Once you’re happy with the list of terms/sentences, get it translated and reviewed, create a new TM from it and use it to auto-populate new translation strings. You might then decide to also lock these segments.
When creating your TM of repeated content, talk to your content team to ensure the content is only used in one particular context. One word can have several meanings depending on the context (e.g. enter can mean to insert, enter a room or the button on your keyboard) and can then be translated differently. If your repeated content isn’t carefully reviewed, it can easily lead to mistakes …and don’t get me started on adjectives 😉.
That’s all on TM for now!