EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Finally Doing Pluralization Right

How the ICU plural syntax works

Alan Allegret
Expedia Group Technology

--

International flags waving in a breeze
Photo by Vladislav Klapin on Unsplash

My team needed our wait time message localized.

wait_time=Estimated wait time is {0} minutes.

The Localization team asked us to declare 6 keys in order to properly localize the seemingly simple message.

wait_time_0=Estimated wait time is 0 minutes.
wait_time_1=Estimated wait time is one minute.
wait_time_2=Estimated wait time is {0} minutes.
wait_time_few=Estimated wait time is {0} minutes.
wait_time_many=Estimated wait time is {0} minutes.
wait_time_other=Estimated wait time is {0} minutes.
Girl with confused expression
GIF from Tenor.com

What are these 6 variants?
Why are they necessary?
Do I really need to copy/paste all my strings 6 times now?

Then I entered the wonderful world of pluralization!

Plura-what?

Pluralization, it’s the problem we face with i18n keys which contain a numeric parameter.

Pluralization is complex and surprisingly diverse across languages. I looked throughout the Egencia® (part of Expedia Group™️) apps’ codebase at how plurals were handled. Only the mobile codebases seemed to consistently declare the 6 versions above in their XML loc files. Among our services and webapps, I saw a wide range of “solutions”. Some apps declared 2 variants. Some did 3. Some didn’t even try. Some gave it a good go and used suffixes like “few” but in the middle of the i18n key name (e.g. advance_purchase_policy_max_days_validation_for_few_days ). This indicates that picking the key is done in a custom way, with if statements.

In short, it was a debacle. And there was a better way out there.

CLDR and ICU to the Rescue

Yes obviously, we’re not the first to face these problems. Au contraire the fine people of CLDR (Common Locale Data Repository) have spent years gathering all the rules of plurals. They defined a syntax using 6 standardized tags.

Languages vary in how they handle plurals of nouns or unit expressions (“hour” vs “hours”, and so on). Some languages have two forms, like English; some languages have only a single form; and some languages have multiple forms. CLDR uses short, mnemonic tags for these plural categories:

- zero
- one (singular)
- two (dual)
- few (paucal)
- many (also used for fractions if they have a separate class)
- other (required — general plural form — also used if the language only has a single form)

The result, this fascinating giant Language Plural Rules table.
And that’s where the Localization team was coming from, with these 6 variants.

Cool, it’s one thing to declare these i18n keys, but how do I make sure to use the correct key at runtime depending on the locale, and the numeric value?
How do I apply the CLDR knowledge in my app?

It depends on the coding language but in the Java world ICU (International Components for Unicode) provides the reference implementation. Years ago IBM® and the ICU team started maintaining a library called ICU4J for Java (and ICU4C for C/C++) which contains all these CLDR rules. In other words, it’s the ICU4J library which decides “oh yeah, in Arabic, use the ‘few’ variant of the i18n key if the number is between 3 and 10”.

Nowadays, Transferwise (someone had to do it and it’s them) maintains spring-icu. It pulls in ICU4J and eases its usage within a Spring® environment mainly by introducing the ICUReloadableResourceBundleMessageSource class.

“I’ll have you know, spring-icu 0.2.3 brings transitively icu4j 66.1 which implements the CLDR 36.1”
– Mr VersionNerd

The ICU Plural Syntax

Back to our example, I was still not too excited about having 6 duplicated i18n keys anytime we have to deal with plurals. Luckily ICU also defines a plural syntax allowing to declare inline all variants in i18n properties files. Much better!

wait_time=Estimated wait time is {0, plural, zero {# minutes} one {one minute} two {# minutes} few {# minutes} many {# minutes} other {# minutes}}.

Note: there are different and perfectly valid variations of the above (e.g. the usage of ‘#’ vs ‘{0}’). This one is the notation agreed upon with the Expedia Group Localization team for them to fully support the plural syntax since summer 2020.

Adoption in Java

Import the icu library

<dependency>
<groupId>com.github.transferwise</groupId>
<artifactId>spring-icu</artifactId>
</dependency>

Then wherever you declared a ResourceBundleMessageSource, replace it with a ICUReloadableResourceBundleMessageSource instead.

public icuMessageSource() {
ICUReloadableResourceBundleMessageSource messageSource = new ICUReloadableResourceBundleMessageSource();
messageSource.setBasenames("classpath:messages");
messageSource.setDefaultEncoding(UTF_8.name());
messageSource.setUseCodeAsDefaultMessage(true);
return messageSource;
}

Voila! You can now use the plural syntax in your i18n files.

Putting it all together

Let’s summarize and see what’s going on at runtime. Starting somewhere in the code with this:

messageSource.getMessage("wait_time", new Object[] {waitInSeconds}, locale);

Let’s assume the locale is fr-FR, the waitInSeconds is 0, and that messages_fr_FR.properties contains the following key:

wait_time=Temps d’attente estimé : {0, plural, zero {# minute} one {# minute} two {# minutes} few {# minutes} many {# minutes} other {# minutes}}.

At runtime, the ICUReloadableResourceBundleMessageSource identifies which properties file to use, like its regular Spring counterpart. So it grabs from messages_fr_FR.properties the raw message pattern “Temps d’attente […]”.

The actual message formatting logic is then delegated to the ICU library. The ICU library parses the plural section, identifies that the variant one shall be used for the value 0 in fr-FR, based on its knowledge of CLDR. The variant other would have been used in en-US. It then proceeds to the formatting of the final message “Temps d’attente estimé : 0 minute.” eventually returned.

Note: Even though there is a zero variant in the pattern, it’s never used even for the value 0. CLDR dictates that the variant one shall be used instead, making zero useless in fr-FR.
This is common. In fact most locales only need one and other. Localization team members may therefore omit variants zero, two, few, and many for their specific locale.
It is a good practice though to always list all 6 variants in the default i18n file.

--

--