Internationalisation (i18n) at SwissBorg

Smur89
SwissBorg Engineering
10 min readMay 6, 2022

The beginning of our i18n journey at SwissBorg

“If I’m selling to you, I speak your language. If I’m buying, dann müssen Sie Deutsch sprechen!“

— Willy Brandt

At SwissBorg we are in the unique position of having our home base in a country with four official national languages in a space of 350km wide by 220km tall — about 8% the size of France — in the geographic heart of Europe. This affords us the perspective and appreciation for the usefulness of a well-internationalised app.

A Map of Europe colour coded by number of official languages showing switzerland with the most amount of official languages in eurioe

Source: https://www.instagram.com/languages.5/

We have all used online services to translate directly from one language to another, only to have the result either not make sense or not quite make it — missing the language of colour, culture, emoji interpretation, and more.

Thankfully, we can leave this part in the capable hands of our Translation Team and show you the technical challenges translation brings.

Technical Challenges

Pluralisation

Pluralisation (p11n) involves applying a mapping from plural rules to plural forms. In languages like English, this is relatively trivial, since there are only two forms, singular and plural. This becomes more complicated in languages with multiple forms, as is the case in some Slavic languages with three or more plural forms, Irish with five and Arabic having six.

To illustrate this, consider the following example taken and expanded upon from this Irish language school book

Table comparing the counting of Hours in English and Irish

In practice, this means our display string depends on two inputs; our translation key and the quantity of the item we are presenting. For English, this might look something like the following:

quantity match {
case 1 => my_display_string.singular // (“hour”)
case _ => my_display_string.plural // (“hours”)
}

Number, date and currency formatting

Countries and cultures denote numbers and dates in different ways, with an obvious example being the date formats on either side of the Atlantic; mm/dd/yyyy v.s. dd/mm/yyyy.

Perhaps less obvious and closer to home is number formatting.Consider the following examples of the same monetary value formatted according to each locale.

Table comparing the formatting of currency values between France, Germany and Ireland

Although it might seem like a minor adjustment, ignoring these differences can easily lead to confusion among users accustomed to seeing monetary values in their local format. The capacity to format numbers also applies to other kinds of numbers, like phone numbers.

Currently at SwissBorg, our solution for this is not perfect; we have a custom defined formatting for numeric values that does not align with any one locale for now.

You can explore these differing locales using the ICU Locale Explorer.

Centralising Display Strings

Within the SwissBorg ecosystem, our display strings are spread out across many different areas. From the mobile applications, website and help centre, through to emails, notifications and the App Store descriptions, they get around.

As much as possible, we aim to centralise the source of these strings and provide a simple process for our translators to work with. To this end, we utilise a Translation Management System (TMS) called Lokalise. Using this approach we decrease complexity and improve efficiency, enabling our translators to directly leverage the features provided by Lokalise in their workflows. Features used include Computer-Aided Translation (CAT), QA tooling, branching, project statistics and context (comments, screenshots, etc.).

Backend Integration

Diagram of the Backend integration with Lokalise

We leverage the Lokalise AWS S3 integration to push translation files into our backend systems. Once the files are inside our system, we replicate the data to all environments using AWS S3 Replication. At this point, the localisation files are ready to be consumed by our services.

Using this approach, we decouple ourselves from any third party dependency, in this case Lokalise, and move our integration dependencies closer to home inside our Virtual Private Cloud (VPC). In doing so, the risk of downtime, high latency, efficiency and rate limits on the Lokalise API are mitigated.

Our S3 buckets are configured to invoke a lambda function for S3 Event Notifications on receiving a PUT. This lambda is a simple function responsible for writing the notification to Kafka which we later use to trigger a refresh of each service’s translation files.

When discussing speed, we have seen that from start to finish, the pipeline takes 20 to 30 seconds; from triggering a build in Lokalise to being visible in the apps. Of course, this may vary per environment and is largely dependent on the AWS S3 Replication SLA of 15 minutes. Considering this can be handled without any developer involvement, we’re quite happy with the relatively small lag.

Our agent library retrieves and caches the most recent files from S3 on startup. It relies on notification events from Kafka to trigger a cache update while the service is running. For this, we use Kafka’s ’auto.offset.reset’ configuration to read from the latest offset, skipping any previous update notification events.

Cache Size

Due to the relatively small footprint of the translations required in our backend services, we can keep all translations cached in memory within our services. Currently, the size on disk of all translations is 49KB.

Now, in the case that memory becomes a concern, we can first look at reducing the number of keys that each service keeps in memory. Currently each service holds all keys from our backend project in Lokalise, even the keys it has no use for. We can easily achieve this by defining a unique bucket prefix per service to export from lokalise. Then it is a matter of a configuration change to read from the new file location in each service.

The internal cache implementation uses a Deferred and a Ref; Deferred in order to wait for the cache to be initialised and Ref to provide safe concurrent access and modification. For more specific details, our approach is quite similar to that outlined by SoftwareMill in their blog post.

Monitoring

“Software testing is a sport like hunting, it’s bughunting.”

– Amit Kalantri

To mitigate possible debugging issues we introduced the concept of versioning our translations cache. This will help us to know which file is being used internally in our services. The ‘version’ is derived from the most recently modified timestamp of the last modified file in the bucket.

We expose this version as a Prometheus Info Metric, which is a Gauge Metric with one possible value. This metric type is designed to store labels whose values may change over time, as is the case with versions.

Table of Prometheus Info Metrics in Grafana showing current translation version

Translation Agent

The majority of our translation requirements are handled by our mobile apps. The scope of the requirements on our backend translations is quite light. Due to the fact that the majority of backend translations revolve around our notifications, the use of something like ICU4J — which implements the full ICU specification — was unnecessary and would only serve to provide a convoluted API to work with. In the future, if translations become more complicated, we may revisit the use of this type of library.

Our main requirements on the backend were support for the following:

  • plain text translation
  • interpolation
  • plurals

Our solution is to provide an internal agent library, which is published to our artifactory for any service which may need to resolve display keys to utilise. Internally, this library is responsible for managing the cache of localisation files from S3 and exposing these through a simple API for use in our services.

The library exposes a single method in its API…

def get(key: String, lang: Lang): F[I18nValue]

…which internally utilises a Java properties file, and this i18nValue type in turn exposes…

def withArgs[F[_]: Raise[*[_], Throwable]](args: Any*)(quantity: Int): F[String]def singular: Stringdef zero: Stringdef plural(quantity: Int): String

…to allow us to handle different plural categories. Currently we have a static plural rules mapping for all languages, which is taken from the CLDR guidelines. We can later provide a mapping per language as new languages are introduced.

App Integration

Although we focus on the less-discussed backend integration here, it would be prudent to briefly mention how our mobile apps interact with Lokalise, as it is slightly different from our backend integration.

To provision translation keys in Lokalise in our build pipeline we utilise the Gitlab Integration provided by Lokalise. We have found that this fits well with our preexisting workflow between design and development, whereby the base translation, in English, is provided by our designers through Figma. Our developers then take the display text from Figma and add it directly to the English String dictionary in the codebase.

As part of the build pipeline on our master/main branch, the String dictionary is pushed to Lokalise, creating or updating the keys/values for the base language. We never delete keys from Lokalise with this integration. This operation is possible because we maintain two separate Lokalise projects for iOS and Android. In this way, each ecosystem can define and rely on its own set of keys. A drawback is that it results in duplicated work for our translators.

Once the base keys are in Lokalise, we can move forward with the process of translating the strings to each supported language, after which, a new bundle is built from Lokalise. On each start of the app we query Lokalise for the latest values which allows for changes to display strings without issuing a new App release.

The current process is under review with plans to move the ‘source of truth’ for the base language out of the codebase and into Lokalise. We are also considering integrating the creation of new keys directly from Figma using the Lokalise plugin for that.

Unicode Consortium

The Unicode Consortium provides a wide variety of data, libraries and information regarding software internationalisation through their International Components for Unicode (ICU) project. A full description of what is provided is available on their website.

Unicode Common Locale Data Repository (CLDR)

The Unicode Common Locale Data Repository (CLDR) is maintained by the Unicode Consortium and provides the underlying data and building blocks for ICU. For a given locale, the CLDR can provide you the locale’s script, its preferred calendar, number system, date formats, pluralisation rules, etc.

Plurals

English has a simple distinction between singular and plural forms — including zero forms. For more complex languages, defining the plurals is a more involved and subtle task; the CLDR defines 6 distinct categories for plural forms:

  • zero
  • one (singular)
  • two (dual)
  • few (paucal)
  • many (also used for fractions if they have a separate class)
  • other (required — general plural form — also used if the language only has a singular form)

A more in-depth view of the ruleset per language can be found here, again as defined by the CLDR.

Encoding

The method by which characters get encoded, passed around and displayed is important in the localisation discussion. Broken character encodings can materialise when encodings are not recognised by the software displaying the information.

As an example, let’s say we provide a localisation for Arabic languages, but use the Latin-1 (ISO 8859–1) character set for encoding. If an Arabic localisation is requested, the user will see Mojibake. ‘Mojibake’ describes instances where text is improperly decoded, resulting in nonsense or random symbols, e.g. squares and other characters instead of letters. It can also be encountered with certain characters in western languages.

Example of how Mojibake looks

Source: https://commons.wikimedia.org/wiki/File:Mojibake_iso8859-1_em_utf-8.png

This issue can be resolved by including the character encoding as part of the message sent between subsystems. UTF-8 has become the de facto and convenient way to represent non-Latin characters. As we use UTF-8 across our application, it should be capable of representing pretty much any character we might come across. Using it for all languages means we do not need to track and convert various encodings depending on the locale being served, with the trade-off of an increased encoded size compared to an encoding supporting a smaller set of characters.

Since Java 9 even properties files, which we use in this integration, are loaded in UTF-8 encoding, replacing the Latin-1 encoding used previously. Recently we updated our services from Java 11 to Java 17, you can learn more about that in our earlier blog post.

Conclusion

The process of localising an IT product is no trivial task, and in this post we have only presented thoughts on decoupling our backend services from display strings.

There is much more to consider including the following:

  • RTL (right-to-left) scripts such as Arabic, Hebrew, Persian, Pashto, Urdu and Sindhi.
    Accounting for design, UX and space constraints.
  • Localising notifications per device.
    What if a user has the application on multiple devices, each with a different locale setting?
  • Character limits creating a display issue.
    A constant occurrence usually resolved by collaboration between translators, designers and developers.
  • Non Gregorian calendars.

The process of internationalising our app is an ongoing project at SwissBorg.The foundations have been laid and we are continuing to improve. There are areas we would like to enhance, open questions and issues to resolve. We aim to also iron out the processes between departments and reduce unnecessary overhead on our translators and developers alike.

After all, part of making wealth management fun, fair and community-centric is making it easily accessible and welcoming to everyone in their native language.

--

--