Optimising iPlayer Translations
Written by Nick Spragg and Nik Rahmel
Introduction
Over the past year, we have started personalising the iPlayer homepage a lot more than we did previously — with this came a re-architecture of the API.
All the data any of the iPlayer clients require to show the complete homepage, be it in the browser, on the TV or in the mobile app, can be requested with one request to the underlying GraphQL API.
Each response is personalized to the user — it includes their currently watched shows, their ‘added’ list and recommendations tailored to their watching behaviour.
The Problem
All this personalisation makes each response unique and difficult to cache on the edge caching layer.
Where previously we could utilise a simple cache layer built using Varnish for a few hundred different homepage variations, this has become impossible with the new architecture.
The computational overheads for each request have increased massively, which requires us to make our code as efficient as possible and optimise any CPU intensive tasks. During initial development, we didn’t meet our prototype performance expectation to handle 3,000 requests per second without breaking the bank, so we started to investigate where our bottlenecks lie.
Thanks to the excellent debug functionalities provided in current versions of Node.js (we are running a TypeScript project on Node.js 8), and the Javascript debug tools in Chrome, simple profiling identified the task that takes the most CPU time in our average requests: Translations.
Translations have been a part of iPlayer for a long time — the website is localised to English, Welsh, Scots Gaelic and Irish Gaelic, with hundreds of pre-set translation strings, and all of these get applied depending on the user in the translations module of this API.
Translations module
Within the iPlayer API, we use a translation module to convert text (or parts of the text) into a specified language. This is achieved with templated strings. For example, a templated string like “#{available-for} 26 #{days}”, would become “Available for 26 days” in English or “Ar gael am 26 o ddyddiau” in Welsh.
The module to do this is relatively straightforward. It exports a translate function which accepts a templated string and locale code. Translate parses the string for translation keys and looks up each key in the corresponding translation lookup. For example, a Welsh translation (cy), `available-for` maps to `Ar gael am`. There are approximately 100 translations per language.
Translation keys in a string are represented using the following syntax: ${example-translation-key}.
Here is an example of an English and Welsh :
const templatedString = '#{available-for} 26 #{days}';const toEnglish = translate(template, 'en');
console.log(toEnglish) // "Available for 26 days"const toWelsh = translate(template, 'cy');
console.log(toWelsh) // "Ar gael am 26 o ddyddiau"
Given the levels of caching within the iPlayer architecture, modules like this have often been written to favour simplicity, testability and accuracy rather than performance.
Here is the original translate function:
const TRANSLATIONS = loadTranslations();
const KEY_REGEX = /#{(.*?)}/g;function translate(value, language = 'en') {
let translated = value;
Object.keys(TRANSLATIONS[language]).forEach((key) => {
const translation = TRANSLATIONS[language][key];
const re = new RegExp(`#{${key}}`, 'g');
translated = translated.replace(re, translation);
});return translated;
}
This function will translate zero or more translations templates in a given string. It works, and it’s very simple but it’s inefficient. Any ideas? Well, for the specified language it attempts to translate every key in the translation lookup. Based on the way this module is currently used within iPlayer, each call will translate between 0 and 3 translations. In a worst-case scenario, it will attempt to make approximately 100 template translations even if a string doesn’t have any.
A potential optimization is to only perform the required number of translations for a given string:
const TRANSLATIONS = loadTranslations();
const KEY_REGEX = /#{(.*?)}/g;function translate(value, language = 'en') {
if (!value) {
return value;
}let translated = value;
let match;
while ((match = (KEY_REGEX.exec(value)))) {
if (match) {
const translation = TRANSLATIONS[language][match[1]];
if (translation === undefined) {
continue;
}
const key = "#{" + match[1] + "}";
translated = translated.replace(key, translation);
}
}
return translated;
}
From strings containing between 0 and 3 translations, benchmarks indicated that selectively replacing translations keys proved significantly quicker. Of course, it has a couple of drawbacks. First, although not rocket science it does add further complexity. Secondly, the performance plummets when a string has 4 or more keys due to the while loop.
Further inspection of the Javascript string replace method signature suggested that a replace function can be supplied as a second parameter. This replacement function will be invoked after each translation is found.
Here is a variant of the translation function using “string.replace” with a supplied replacement function:
const TRANSLATIONS = loadTranslations();
const KEY_REGEX = /#{(.*?)}/g;function translate(value, language = 'en') {
if (!value) {
return value;
}return value.replace(KEY_REGEX, (a, b) => {
const translation = TRANSLATIONS[b];
if (translation === undefined) {
return a;
}return translation;
});
}
Arguably a more elegant solution. As a rule of thumb, it often pays dividends to reuse standard libraries or trusted third party solutions rather than rolling out your own solutions. Time has already been invested by talented engineers to optimize and make the code robust. Furthermore, standard libraries are officially supported and will be well tested.
This was evident is the benchmarks; this solution proved to be as fast (approximately) as the first optimisation for 0–3 translations but dramatically quicker for 4 or more.
Validation
One of the main objectives when it comes to optimisations like these is the user experience, and serving the majority of requests as quickly as possible.
In order to validate our changes, we run load tests on a controlled test environment. Alongside the profiling detailed above, this informs our prioritisation of potential optimisations.
First Iteration
For our initial validation, we had to find the current breaking point. We use Gatling as a load testing tool, with a script that can make personalised requests for any number of user profiles and client configurations. We can pick to test with a representative split of the audience, majorly favouring the English language, or an even split of all our supported languages.
Since our objective was to improve the CPU cost of translations, and not the way we store or cache the results, this is largely irrelevant, and we tested with an even language split. We picked 2 instances of a relatively small machine type, and a relatively low number of users, 30; each making up to 1 request per second.
Looking at the response times over time, we get the following graph:
Here we see that response times start out stable and with low variations in response times, but by the time we reach 20 or so users they reach over one second, which results in us not reaching our goal of 30rps as each user has to wait for their request to finish before they can make a new one. However, the API does not fall over completely — there is a much more noticeable slow down at around 12:00:20, but it recovers itself.
Improvements
With our first iteration of algorithm improvements, we get a much more stable graph with the same test setup:
There are minutely spikes in response times for the upper percentiles, which are due to some cache refreshes, and irrelevant to this. We have made improvements! But what is our new limit?
For our second round of testing, we change the test setup: We’re only using one machine to run the API now, and we increase the number of users iteratively until we find the new breaking point. This is the graph for 50 users:
We can ignore the minute spikes again as before.
As before, when we reach around 40 users, the slow down becomes very pronounced, and looking at the profiling, it seems like we can make some more improvements — translate still takes up a majority of CPU time for our responses, and we’d like to squeeze every little bit of performance out of this.
Final Results
After the second round of improvements, with the same test set up, our graph looks like this:
It looks like it starts to slow down again at 20 users, but if we look at the left y-axis, we can see that the range of slowdowns is much lower, and within the acceptable range for this load test.
Profiling now shows us that translate is not the biggest CPU expense now, and in combination with the graph above, it is time to look at where some of the patterns in spikes come from. But this is for another post!