Worldly: A new localization file format
I work on a web application that’s translated into dozens of languages, with hundreds of string changes being translated every week. Parts of the app use a workflow loosely based on Rails’ .yml files, while another part uses gettext .po files.
Our current tools suffer from significant shortcomings around correctly handling plurals and gender. ICU MessageFormat solves these, but we can’t use it because translators find it too complex.
Another requirement is formatting parts of strings, for example as links. None of these options have a built-in way to represent that.
This is my proposal for a new format for localizable strings.
Plurals and Gender
English has two plural forms — singular (1 user) and plural (0 users, 2 users). This is not true for all languages: for example Chinese has just one form, Russian has 3 and Arabic has 6. ICU defines six standard names for these forms — zero, one, two, few, many and other — but their meanings vary between languages.
Our existing tools allow translators to provide as many variants of a string as there are plural forms in their language. This works well when there is only one number in the string.
Consider the string “2 users wrote 4 posts yesterday” Our current tools require this string to be split up:
summary: "%{users_str} wrote %{posts_str} yesterday",
users_str: {
one: "1 user",
other: "%{count} users"
},
posts_str: {
one: "1 post",
other: "%{count} posts"
}The parts in bold are what translators need to interact with.
ICU MessageFormat does not require it to be broken, but it can be hard for translators to follow:
summary: "{nUsers, plural, one {1 user}, other {# users}} wrote {nPosts, plural, one {1 post}, other {# posts}} yesterday"Instead, Worldly could do something like:
summary: {
"$vary": "nUsers plural, nPosts plural",
"one,one": "1 user wrote 1 post yesterday",
"one,other": "1 user wrote {nPosts} posts yesterday",
"other,one": "{nUsers} users wrote 1 post yesterday",
"other,other": "{nUsers} users wrote {nPosts} posts yesterday"
}The $vary key is only used by tooling. Strings that the translator needs to work with (in bold) are very easy to understand.
Ordinals
Languages may also have different pluralization rules for ordinal numbers (1st, 2nd, etc.). Just like ICU MessageFormat, Worldly could support this by replacing plural with ordinal in $vary.
Extended Plurals
Occasionally, it might be necessary to special-case a particular value. For example, if the translator wishes to use the word “dozen” when the count is 12, she may replace other with =12. This, too, works just like ICU MessageFormat.
Selects
The string may also vary based on a non-numeric variable, like gender. Like ICU MessageFormat, Worldly could support this by replacing plural with select and one, other with the possible values of the variable.
Formatting
Consider the string “You have 3 posts”, where the phrase “3 posts” should be emphasized. Once again, our current tooling requires this to be split into two strings:
summary: "You have %{posts_str}",
posts_str: "%{count} posts"ICU MessageFormat will also require something similar, as there is no support for formatting strings.
Worldly could support the syntax:
summary: "You have {$em {nPosts} posts}"