EXPEDIA GROUP TECHNOLOGY — SOFTWARE
Finally Doing Pluralization Right
How the ICU plural syntax works
My team needed our wait time message localized.
wait_time=Estimated wait time is {0} minutes.
The Localization team asked us to declare 6 keys in order to properly localize the seemingly simple message.
wait_time_0=Estimated wait time is 0 minutes.
wait_time_1=Estimated wait time is one minute.
wait_time_2=Estimated wait time is {0} minutes.
wait_time_few=Estimated wait time is {0} minutes.
wait_time_many=Estimated wait time is {0} minutes.
wait_time_other=Estimated wait time is {0} minutes.
What are these 6 variants?
Why are they necessary?
Do I really need to copy/paste all my strings 6 times now?
Then I entered the wonderful world of pluralization!
Plura-what?
Pluralization, it’s the problem we face with i18n keys which contain a numeric parameter.
Pluralization is complex and surprisingly diverse across languages. I looked throughout the Egencia® (part of Expedia Group™️) apps’ codebase at how plurals were handled. Only the mobile codebases seemed to consistently declare the 6 versions above in their XML loc files. Among our services and webapps, I saw a wide range of “solutions”. Some apps declared 2 variants. Some did 3. Some didn’t even try. Some gave it a good go and used suffixes like “few” but in the middle of the i18n key name (e.g. advance_purchase_policy_max_days_validation_for_few_days
). This indicates that picking the key is done in a custom way, with if statements.
In short, it was a debacle. And there was a better way out there.
CLDR and ICU to the Rescue
Yes obviously, we’re not the first to face these problems. Au contraire the fine people of CLDR (Common Locale Data Repository) have spent years gathering all the rules of plurals. They defined a syntax using 6 standardized tags.
Languages vary in how they handle plurals of nouns or unit expressions (“hour” vs “hours”, and so on). Some languages have two forms, like English; some languages have only a single form; and some languages have multiple forms. CLDR uses short, mnemonic tags for these plural categories:
- zero
- one (singular)
- two (dual)
- few (paucal)
- many (also used for fractions if they have a separate class)
- other (required — general plural form — also used if the language only has a single form)
The result, this fascinating giant Language Plural Rules table.
And that’s where the Localization team was coming from, with these 6 variants.
Cool, it’s one thing to declare these i18n keys, but how do I make sure to use the correct key at runtime depending on the locale, and the numeric value?
How do I apply the CLDR knowledge in my app?
It depends on the coding language but in the Java world ICU (International Components for Unicode) provides the reference implementation. Years ago IBM® and the ICU team started maintaining a library called ICU4J for Java (and ICU4C for C/C++) which contains all these CLDR rules. In other words, it’s the ICU4J library which decides “oh yeah, in Arabic, use the ‘few’ variant of the i18n key if the number is between 3 and 10”.
Nowadays, Transferwise (someone had to do it and it’s them) maintains spring-icu. It pulls in ICU4J and eases its usage within a Spring® environment mainly by introducing the ICUReloadableResourceBundleMessageSource
class.
“I’ll have you know, spring-icu 0.2.3 brings transitively icu4j 66.1 which implements the CLDR 36.1”
– Mr VersionNerd
The ICU Plural Syntax
Back to our example, I was still not too excited about having 6 duplicated i18n keys anytime we have to deal with plurals. Luckily ICU also defines a plural syntax allowing to declare inline all variants in i18n properties files. Much better!
wait_time=Estimated wait time is {0, plural, zero {# minutes} one {one minute} two {# minutes} few {# minutes} many {# minutes} other {# minutes}}.
Note: there are different and perfectly valid variations of the above (e.g. the usage of ‘#’ vs ‘{0}’). This one is the notation agreed upon with the Expedia Group Localization team for them to fully support the plural syntax since summer 2020.
Adoption in Java
Import the icu library
<dependency>
<groupId>com.github.transferwise</groupId>
<artifactId>spring-icu</artifactId>
</dependency>
Then wherever you declared a ResourceBundleMessageSource, replace it with a ICUReloadableResourceBundleMessageSource
instead.
public icuMessageSource() {
ICUReloadableResourceBundleMessageSource messageSource = new ICUReloadableResourceBundleMessageSource();
messageSource.setBasenames("classpath:messages");
messageSource.setDefaultEncoding(UTF_8.name());
messageSource.setUseCodeAsDefaultMessage(true);
return messageSource;
}
Voila! You can now use the plural syntax in your i18n files.
Putting it all together
Let’s summarize and see what’s going on at runtime. Starting somewhere in the code with this:
messageSource.getMessage("wait_time", new Object[] {waitInSeconds}, locale);
Let’s assume the locale
is fr-FR
, the waitInSeconds
is 0
, and that messages_fr_FR.properties
contains the following key:
wait_time=Temps d’attente estimé : {0, plural, zero {# minute} one {# minute} two {# minutes} few {# minutes} many {# minutes} other {# minutes}}.
At runtime, the ICUReloadableResourceBundleMessageSource
identifies which properties file to use, like its regular Spring counterpart. So it grabs from messages_fr_FR.properties
the raw message pattern “Temps d’attente […]”.
The actual message formatting logic is then delegated to the ICU library. The ICU library parses the plural
section, identifies that the variant one
shall be used for the value 0
in fr-FR
, based on its knowledge of CLDR. The variant other
would have been used in en-US
. It then proceeds to the formatting of the final message “Temps d’attente estimé : 0 minute.” eventually returned.
Note: Even though there is a zero
variant in the pattern, it’s never used even for the value 0
. CLDR dictates that the variant one
shall be used instead, making zero
useless in fr-FR
.
This is common. In fact most locales only need one
and other
. Localization team members may therefore omit variants zero
, two
, few
, and many
for their specific locale.
It is a good practice though to always list all 6 variants in the default i18n file.