The State of Marshallese and Latvian Text
I was going over my recent language posts and realized that I missed one on my most niche expertise: Marshallese vs Latvian!
The Marshall Islands are a group of small atolls, spaghetti-like islands in the middle of the Pacific, so there are only an estimated 55,000 speakers. You can find speakers in Marshallese communities in Hawai`i or Arkansas, if you can believe it. Language resources are limited; I got an audiobook produced with support of the Mormon church.
Marshallese Alphabet v2, established 1970
When the first Christian missionaries arrived on the island, they published Marshallese language Bibles. To capture the full range of sounds in the language, the missionaries created a 24-letter alphabet with some consonants and vowels not included in English. In the 1970s, linguists revisited the Marshall Islands and changed the accent marks on some letters (eg. ñ became n̄, because the sound is closer to ‘ng’). The current alphabet, posted in each elementary school classroom, looks something like this:
a ā b d e i j k l ļ m m̧ n ņ n̄ o o̧ ō p r t u ū w
An alphabet is not so easy to change, so there are holdouts, and it is not uncommon to see someone use ñ. Our OLPC laptops had ñ visible on the n key as an alternate character, so many students would suggest that. In these cases I would also teach a two-character method (o + accent). It was necessary then to normalize Marshallese data or quiz answers to allow four possible typings (ñ, n~, n-, n̄ ).
In e-mails and documents which I received at the Ministry of Education, Marshallese words were written in a special font: TimesGKM. This replaced non-Marshallese letters on the US keyboard (such as x) with accented letters in the new alphabet. Unfortunately users without the font (like me) receive many messages and letters full of x’s and z’s.
Unicode messing it up
Using a special font may be a matter of convenience, but for documents on the web and other contexts, there should be a Unicode character which represents each of these letters. Most are already covered in ASCII and Latin-1 encodings, but surprisingly the ‘comma letters’ (ļm̧ņo̧) are not quite right (don’t ask me to cover their practical use so much… an American publisher who has lived in the Marshall Islands starting as a Peace Corps volunteer in the Vietnam War era, told me that he still cannot pronounce them).
In 2013, the Unicode Consortium discussed adding four codepoints because the current official versions match the Latvian alphabet with a comma underneath (see ņ). The Marshallese preference is to show these with a cedilla instead (as in the French letter ç). As far as I can tell, the proposal has not moved forward, and the Marshallese did not send a local representative to the discussion.
Contributing to Wikimedia.IME
I was interested in adding the Marshallese keyboard (including their preferences in TimesGKM) to Wikipedia’s special input library. You can review the key-mapping code here: https://github.com/wikimedia/jquery.ime/pull/154
I’ll be happy to update it if Unicode breaks Marshallese off from Latvian!