When writing Resftul APIs it’s always best to make use of HTTP headers. When dealing with localisation there are two headers that can be taken advantage of. The request header Accept-Language and the response header Content-Language.
The full format of a language tag is language-extlang-script-region-variant-extension-privateuse however in general practice the tags take the format language-script-region, with both script and region optional. Subtags are separated by hyphens.
Languages are two to three letters in lowercase. Script subtags are four letters long and the initial letter is capitalised. Region sub tags are two letters capitalised or 3 digits.
- “Content-Language: en” for English language
- “Content-Language: fr-CH” for the French language and the Swiss region.
- “Content-Language: zh-Hans” for Simplified Chinese
- “Content-Language: zh-Hant-TW” for Traditional Chinese in Taiwan
The Accept-Language header is similar but has a more complicated format: “Accept-Language: language-tag(;quality-value)”. This header can take multiple language tags and an optional quality value to set the preference order. The higher the quality value the more preferred that language is. The range is between 0 and 1.
- “Accept-Language: en”
- “Accept-Language: en_IE, en_UK;q=0.8, en;q=0.5” preference Irish English, UK English and then American English
Software libraries for localisation generally use something like CLDR which has a different format to BCP 47. The main difference between CLDR locales and BCP 47 language tags is that CLDR uses underscores. Most implementations of CLDR support both underscores and hyphens and the standard recommends that.
This leads to some code like this:
Which can be curl’d with:
curl -i http://localhost:8080/ -H "Accept-Language: en-IE"
In the previous code the getPreferredLanguage method is used to parse the header. It takes an ordered array of CLDR locales to match against the request header and returns one matching locale or the first index if no match is found.
Also the chosen locale has underscores so when setting the response header a judicious str_replace is required to adhere to the BCP 47 format.
There is a setLocale method on the Request class but this will set the PHP environment’s locale which can intefere with other operations like inserting numbers in to a database. Instead the parsed locale is saved in the DI container and then passed back to the consumer.
The Symfony code works ok with language codes on their own and with language codes combined with 2 letter region codes but appears not to work correctly for tags that contain scripts like zh-Hans-CH.
This can be worked around either by limiting the languages used or by using the Symfony AcceptHeader class to parse the request header and using the lookup matching pattern to pick a locale.
That’s the basics of how to parse the Accept-Language header and set the Content-Language header.