Conquering digital worlds in Scottish Gaelic
r12n profile: Free software localization with GunChleoc, Chieftain of Widelands
In the strategy game Widelands, the player begins as the chieftain of a small settlement. The player guides the development of the settlement from a single outpost building into a sprawling commercial empire. In many ways, Widelands resembles the game series The Settlers which first appeared in 1993. While the latest version of The Settlers can only be played in English, Widelands can also be played in dozens of other languages: global languages such as French and Spanish, national languages such as Bulgarian and Finnish, and even several languages of fewer speakers such as Catalan, Galician, and Esperanto. The game’s free and open source software model welcomes volunteer collaborators, including translators, to expand its language offerings.
Managing the Widelands Development Team since 2016 is GunChleoc, who joined the project as a Scottish Gaelic translator in 2013. GunChleoc (a pseudonym) has long been a respected member of the free software development community, having contributed translations to dozens of projects and managed broader translation initiatives.
Scottish Gaelic (Gàidhlig, not to be confused with Irish, Gaeilge) is a Celtic language indigenous to Scotland. Gaelic is spoken by around 58,000 people in Scotland, and over 7,000 people in Canada. Under pressure from English, Gaelic has been in decline as a community language for centuries, but a vigorous revitalization effort has been underway. A 2016 census found that only 522 public school students had Gaelic as the primary language at home, however, but there has been a recent increase in Gaelic medium education.
“What I’m trying to do is ensure Gaelic will have everything, at least one example of everything, that an English speaker would have. That’s my long term goal.”
While there are few examples of commercial software available in Gaelic, an impressive collection of free software — games, office software, mobile apps, etc. — is now available. Even more impressive is that this wealth of Gaelic software is primarily the result of the volunteer efforts of just two people: GunChleoc and Michael Bauer, a Glasgow-based language consultant and publisher. I recently spoke with GunChleoc about her experience helping to enable and promote Gaelic-language computing.
Gaming in a minority language
A version of the game Widelands was first posted online in 2001, and the game has been in continuous development ever since. Perhaps the primary factor in its longevity (aside from the fact that it is a lot of fun) is that it is a free and open source software project, entirely supported by a volunteer community of developers, artists, and writers. In contrast to proprietary software which is typically developed as commercial projects, free software is open to user modifications and contributions. For a video game, these modifications might include new game mechanics or levels, new or refined artwork, and versions of the game that work on different platforms (Windows, OSX, Android, etc.). In the case of Widelands, this includes a global community that has completely or partially translated the game into 60 languages.
There is a solid case — moral, social, and cultural, if not initially commercial — for empowering language communities with fewer speakers to use their languages in digital contexts. GunChleoc has long believed that Gaelic deserves a digital presence, and has devoted an extraordinary amount of mostly-volunteer effort — literally years of work — into localizing software into Gaelic. Impressively, she is a not a native speaker, but a learner, which provides her with an additional context for understanding the language. “My first contact with the language was through the rock band Runrig. I liked the sound of the language, so I first started learning the pronunciation,” she explains. “Then a bit of grammar and vocabulary, then I went to visit Scotland… let’s say it became somewhat addictive.”
A linguist and computer scientist by education, and a translator by profession, she also maintains the Fòram na Gàidhlig website and forum for Gaelic language learners. “When I stated running the Fòram na Gàidhlig site, I decided that having a Gaelic-language user interface for the message board software would be a good learning tool for myself,” she explains. “Since there was nobody else to do it, with help from the community, I went ahead and translated it to (probably pretty awful) Gaelic.”
Shortly thereafter, she began collaborating on localization projects with Bauer. “Akerbeltz [Bauer] had just finished translating Mozilla Firefox. So, we teamed up and got on like a house on fire. We have been cooperating on Gaelic localization ever since my language skills became good enough.” GunChleoc has focused her localization efforts on games, so I assumed she was a big gamer. Surprisingly, this is not the case. “I don’t play the games I consider [for localization], I just test play them to see if they are good.”
There are now word games like Scrabble (translated by Bauer), real-time strategy games like 0 A.D. and Megaglest, a Lemmings-like game called Pingus, and even a Mario Kart clone called SuperTuxKart. GunChleoc has intentionally translated games from multiple genres. “What I’m trying to do is ensure Gaelic will have everything, at least one example of everything, that an English speaker would have. That’s my long term goal.”
Localization in the minority language context
Translating a software interface — the parts of the program that a user interacts with — relies on two interrelated technical processes. The first process, internationalization (often abbreviated i18n), ensures that software is designed to be efficiently and effectively adapted for use in different languages and regions. The second process, localization (l10n), is the actual translation of the interface — the bits of the software that the user interacts with — into a new language. A software interface is typically developed initially in a global language such as English, and then localized into additional languages.
As with any translation process, software localization is typically a highly-skilled endeavor, with the localizer having fluency in both the source and target languages, as well as a level of familiarity with code and software project management. Localization in proprietary projects is typically paid at skilled labor rates. Free software projects might rely on crowdsourcing and volunteer managers.
A language like Gaelic does not represent an interesting market for most commercial products — a software publisher can safely assume that most potential customers who speak Gaelic are bilingual in English. Resources invested to Gaelic localization, therefore, are viewed as unlikely to provide a financial return. While a localized version of Microsoft Office is available, such commercial Gaelic-language software products are rare.
The majority of software, free and otherwise, is developed in English, the lingua franca of the tech world. The more different a given language is from English, the more types of issues may arise when attempting to translate software into that language. One challenge arises from the fact that many developers have a limited understanding of internationalization and languages, and a coding strategy that works for some major languages many not work for others.
The bits of text that a user sees — menu items, buttons, error text, and so on — are represented in source code as “strings,” a variable datatype that holds alphanumeric characters. String variables are often used to store things like one’s username on a social media platform like Facebook, and can be used to create on-screen text to personalize interactions.
For example, a person’s name (Jane) could be stored as a string (
$username) and paired with another string (“likes your post”, stored as
$message) to create a complete message within a template. The pseudocode may look something like this:
display $username + $message .
This would render as Jane likes your post. This string could then be localized — translated — to provide alternative messages to users in additional languages. In Portuguese, the user might see Jane gosta de sua postagem while a German user might see Jane mag deinen Beitrag.
The problem is that this “hard coded” structure is not compatible with many other languages. GunChleoc explains that for this example in Gaelic, “the correct translation is ’S toigh le Jane am post agad.— there is no way that this sentence could start with ‘Jane,’ ever.”
Given the prevalence of English-speaking software developers, interface translation is often a secondary consideration due, in part, to a handful of challenges. First, many developers have a limited understanding of internationalization and languages, and a coding strategy that works for some major languages many not work for others.
Second, and perhaps more challenging, internationalization requires consideration of “plural rules” and interface design. In English, most nouns only change once with number, by adding an “s”: 1 post, and 2, 23, or 4,000 posts. The rules in Gaelic, and in many other languages, are more complicated. “Let’s count cats,” GunChleoc begins: “1 chat, 2 chat, 3 cait, … 10 cait, 11 chat, 12 chat, 13 cait, … 20 cat, 21 cat…” And so on. “Our Slavic colleagues are also regularly pulling their hair over this particular issue.” If a user interface needs to provide a count of “friends,” or “points” or “unread emails,” the plural rules of different languages can prove daunting.
Likewise, while the English term “OK” fits well on the ubiquitous buttons on pop-up screens, the longer Gaelic equivalent “Ceart ma-thà” may not. The screenshots below demonstrate some of the challenges GunChleoc faces in localizing Widelands. On the left is the previous version of an in-game screen, while the right represents a newer version of the interface with most of the problems fixed.
The localization challenges in this example included truncation of the button texts (along the top), some text not marked for translation (the items in the main window), and an incorrect plural form “‘22 puingean’ on the bottom should be ‘22 puing’.” A remaining glitch, soon to be fixed, is the presence of a � in a word where the font renderer is misbehaving— likely a familiar experience to anyone who has worked in a language with accented characters.
Such problems are largely “solved” if software developers are familiar with (and use) a standard internationalization framework like gettext. Such frameworks provide a technical structure to more easily account for this linguistic variation. However, GunChleoc suggests that internationalization, or language technology more broadly, is seldom addressed in computer science education programs.
“When you see somebody who has programmed their own translation system, you just want to scream, but they just don’t know any better. Fortunately, many of the more long-standing FLOSS [free, libre and open source] projects have already switched to using gettext or QT, so all that remains for me to do is to file the occasional bug for strings that need adapting.”
“While the majority of projects are very welcoming, you will get the very, very occasional odd reaction from people like ‘Why are you wasting your time translating into a dead language, and why should we be bothered to support it? You will never finish that translation anyway and maintain it through future updates.’” The proper response to this, GunChleoc advises, is patience and perseverance. “More often than not, they are simply uninformed and don’t mean anything by it.” Once a level of trust and understanding is developed, such contributions will be welcomed.
“And then you can go on and impress them with your localization fu.”
At the other end of the localization process lies another challenge: how to encourage user adoption of this software that has been localized. While many localization efforts focus on overcoming digital divides and allowing new populations to engage with technology, the audience for Gaelic software is entirely fluent in English. These users will only start using a Gaelic technology if they are made aware of it, and if the perceived benefits outweigh the perceived costs. “Many people find installing things on their computers scary, so you will have to be prepared for that,” GunChleoc notes. “Some will snap up your translations with joy, others will refuse to have anything to do with it.”
“What we really need is to get a native speaker going around the country,” she suggests. “This person has one computer of every operating system in his or her pocket, and [provides demonstrations] to people who do not know about Gaelic software.”
Thanks to William J. Moner for useful comments on a draft of this article.
r12n is an irregular publication that features interesting people, projects, and ideas that connect with language revitalization and technology. Feedback appreciated. If you like it, please recommend or share the link!