Localization Technologies @ Crunchyroll

Published in

Crunchyroll

8 min readSep 2, 2021

Crunchyroll is always trying to be closer to its customers, and one of the steps in breaking the barriers is localization. As follows, we started to set up our client apps localization process almost one year ago. Now when things got a bit more stable, we’ve decided to share our experience with you.

From the very beginning, we knew that we need to set up a localization process that would be as flexible as possible and will not block or even delay our continuous feature development and delivery processes. We wanted to set up a localization system that would allow us to:

share translations between different client apps; (Web/Android/iOS/Living room devices)
have a separate translation team that would use an easy user interface during the translation process;
have the ability to bundle translations into the app before releases;
have the ability to update/fix translations remotely without doing new releases.

As an Android engineer, I am going to guide you through the whole translation system from the Android perspective; however, there is almost no difference between all other clients.

The copies and their translations pass a long journey through the whole system moving from one place to another to be processed or consumed. And to make it more clear for you, let’s go back to the roots.

Starting point

So, before the development team starts working on a new feature, it receives a set of design prototypes and copies from the product team. Usually, the copies are provided within the design prototypes or even as requirements inside the Jira ticket description.

During the feature development, all new copies are added to the source language file; for Android, it's the strings.xml file located in theresources package. Of course, all this happens on a separate feature branch and is followed by a Github pull request.

As soon as the pull request gets merged to the master branch, a CI job is triggered, and besides all checking and artifacts generation processes, it also takes the whole strings.xml file and uploads it to the Transifex(TX) service; this way, the new copies reach the next destination. Transifex is a platform where the actual translation process happens. Will talk thoroughly about it later but now, let’s dive a bit more deeply into the upload process.

Client app → Transifex

The upload process is quite simple. In case you have a monolith-based project architecture with all copies located in one single strings.xml file, you just need to clone the project sources, init the Transifex client, and run the upload method. What could be easier? 🤷‍♂

However, things were a bit more complicated in our case, considering that we are continuously migrating our monolith-based architecture into a feature by module one. Here, we have at least one strings.xml file per module. Also, we usually group copies by context into separate files; for example, we have individual files for content description copies. Thus, when we were integrating the Transifex service, we had at least 5 different files with copies, and we knew that we would have even more.

The very first idea we had was to merge all these files into one strings.xml file and upload all copies together. But this didn’t work well for us because, besides the ability to consume translations remotely, we also wanted to be able to bundle them into the app before releases, and if we had to upload them merged into one single file, we would not know what translations in what modules should go during the bundling process.

After a short investigation, we found that when we upload a strings.xml file to the TX, it creates a so-called resource for it, and all translations are grouped under this resource.

Therefore, the decision was to configure the Transifex client in such a way so that it would create separate resources for every module we had. This intention, and the fact that we wanted to make all TX related stuff client agnostic, resulted in a JSON config which contains the list of resources or modules in our case, and some other metadata required for the TX client to work properly like client type and the path to the source language copies file.

{
    "deveS3BucketDest": "s3://dev/crunchyroll/translations",
    "stagingS3BucketDest": "s3://staging/crunchyroll/translations",
    "prodS3BucketDest": "s3://production/crunchyroll/translations",
    "txModeStaging": "sourceastranslation",
    "txModeProduction": "onlyreviewed",
    "projectName": "crunchyroll-android",
    "resources": [
        {
            "customResourceName": "crunchyroll",
            "filePath": "crunchyroll/src/main/res/values",
            "fileName": "strings",
            "fileExt": "xml",
            "i18nType": "ANDROID"
        },
        {
            "customResourceName": "ratings",
            "filePath": "features/ratings/src/main/res/values",
            "fileName": "strings",
            "fileExt": "xml",
            "i18nType": "ANDROID"
        }   
    ]
}

We placed this config into the root of our project and consumed it from the Jenkins pipeline step that is uploading the strings to the service. Configuring the TX client this way makes it correctly pick the strings.xml files that should be uploaded to the service and tells it what translations files should be downloaded and where all of them should be placed, which was perfect for our bundling step.

Transifex

As I said initially, we had wanted to have a separate isolated environment for our translation teams, and after some investigations, we decided to use the Transifex service for this. Transifex is a cloud-based localization platform that roughly speaking allows you to store your copies in a source language of your choice and provides a neat user interface for the translators to translate them.

As soon as new copies are uploaded to the Transifex service, our translation team is notified that new copies are available for the translation.

A copy in Transifex can be in 3 different states: Untranslated, Translated, and Reviewed. And what’s very useful is that Tarnsifex has a nice feature called Transifex Memory. Thus, you can set multiple projects to share the same memory, and the newly uploaded copies will be translated automatically in case they were translated once in any project sharing the same memory. Of course, there might be cases when the same copies were translated differently in various projects, and in this case, TX will not be able to auto-translate. Still, the translator will be provided with the variants and would be able to choose the most suitable one or add a new one.

Our translation team doesn’t actually have too much access to the client features that are under construction, and this was one of the first problems we faced at this phase. It's pretty hard to provide a suitable translation when you have no context about where all of them will be used.

The solution was straightforward; the Transifex service supports different ways to bring more context to the translators; for instance, the comments from the strings.xml are parsed by the service and provided to translators during the translation.

Also, there is a possibility to take screenshots of different parts of the app/feature, upload them to the TX, and mark the location of the copies on them so that the translators could see where the translation is used inside the app.

As I said before, we wanted to be able to update the translations remotely, which means that we needed a storage where to keep all of them so that the app would be able to download and consume them. At this point, we decided to keep things simple and used a simple AWS S3 bucket for this; we set up 3 buckets, one per environment we have development, staging and production. We’ll talk in more detail about uploading and consuming the translations from those buckets down the road.

Moving to the next phase

Thuswise, after all the new copies are translated into any language, their status becomes Translated. Then, a webhook is triggering a CI Job, which collects all translations with Translated status and uploads them to the development and staging S3 buckets, making them available for development and staging app builds. At this point, the QA team can start testing the new feature and the translations altogether.

All translated copies are also reviewed by the Translation Leads that are going over the translations and marking them as Reviewed in case they fit the context well.

As soon as all translations are marked as Reviewed in any language, another webhook triggers another CI Job that collects all Reviewed translations and uploads them to the production S3 Bucket, which means that the translated copies become available for all production builds.

Translations storage (S3 Buckets)

As I said above, we decided to use S3 buckets as storage for all our translations; this would allow us to make them remote adjustable, and also this serves as a backup in case someday we’d like to replace Transifex with something else.

The idea is quite simple, upload the translation files to the S3 bucket, make a public URL, and then make the app download the translations and consume them.

However, there was an issue here as well. 😞

As I said, we have a feature-by-module-based architecture; thus, we have at least one translation file per module, which multiplied by the number of languages we support becomes a vast number of files that should be downloaded selectively.

Taking into account that we don’t care about modules in runtime, we decided to merge all translation files into one so that we get one file per language. This simplified the consuming process in runtime a lot but increased a bit the deploy complexity and added some constraints to the way we’re defining the strings, as now we need to take care that we don’t have duplicated strings to avoid any collision. 🤷‍♂

S3 Bucket -> Client app

Congrats 🎉
We’ve reached the last phase of our translation journey. In summary, as I was saying before, when the translations are uploaded to S3 buckets, they become available for the app in runtime.

At every application start, the app takes the system locale, builds the appropriate link to the translations file, and downloads it from the S3 bucket.

Now having all translations as an extensive collection of key-value pairs, we just left to make the app consume them instead of the ones bundled inside the APK. There were 2 options at that moment: to write a library or to find one. The decision was to use Philology, a library that takes a set of translations as a key-value pair and uses it over the strings bundled inside the APK.

Transifex -> Client app (Translations Bundling)

Considering that we didn’t want to block the app launch while downloading translations and we didn’t want to apply translations on the fly to avoid confusion and partially translated components, the user would see the newly downloaded translations only at the second app lunch, which is not the best UX.

In sum, we decided to also bundle the translations before each release into the app so that the app will be translated instantly; of course, the translations might be stale, and the remote ones are the newest, but still, it is better than nothing.

In a nutshell, before each release, we’re fetching all translations from the Transifex service and add them to the project sources before assembling the APK. Unfortunately, we haven’t fully automated this process yet due to some issues with the TX client, but this is in progress and should be done soon.