Language Documentation in 2021

Documenting languages is a tricky business at the best of times, but 2021 offered some exceptionally unique challenges and solutions.

Elliot Holmes
Wikitongues
5 min readAug 2, 2021

--

Photo by Sigmund on Unsplash

In my final article in this series on Judeo-Malayalam, I want to get a little “meta”. (If you haven’t read the previous posts, you can find part one here and part two here). I want to reflect on what it was like undertaking a language documentation project in 2021. I think it’s important that we have reference points like this going forward: in the future, people will be able to look back at how things were done now and improve upon them or, given the rather elephant in the room, how we managed to keep efforts going in the face of a global pandemic (which, hopefully, future readers won’t be dealing with).

The Elephant in the Room

On the surface, it might seem like COVID-19 made things far more difficult than they should have been. To some degree, that’s true: I couldn’t go out to Kerala and spend time with Thapan, I couldn’t immerse myself in his culture and his surroundings, and we couldn’t chat away about the language and its history long into the night. When it comes to learning about languages and cultures, nothing is more important than experience and immersion: you find out how the language works directly from its speakers, you watch them interact, and sometimes the most important findings can just materialise out of random, chance discussion.

The most pessimistic way of viewing language documentation in 2021 is that COVID-19 took these opportunities away from us; however, as is the way with most things in life, when one door closes another one opens. There are accessibility issues facing us at the best of times (flights, travel, accommodation, limited time, and all the costs associated with these) and sometimes it can be hard to juggle everything at once. That’s why, optimistically, COVID-19 actually helped me see how far we’ve come as a society and how the resources readily available to us can accommodate language documentation. From the same chair and computer, I was able to effortlessly reach out to Thapan, talk to him about Judeo-Malayalam, and ask him to produce the wordlist I made. Then, from the same chair and computer again, I was able to analyse that speech, segment it, and identify every speech sound in every word before finally collating all of my findings into these blogposts. Of course, it is undeniable that experience and immersion will always take precedence and there is no accurate way to recreate that just yet. But the technologies of today really can enable us to do and achieve powerful things and still contribute to language documentation efforts in meaningful ways, as these blogposts have shown.

Popular Methods are Outdated

Going back to that wordlist I mentioned, this was something that had to change and extended the project slightly. When I was looking at the approaches to take, the Swadesh List came up in my reading. This is a list of 207 words that was designed in the 1950s for historical comparative linguistic studies: it covered many grammatical concepts, such as pronouns and verbs, as well as lexis surrounding the environment, such as the sky. The idea was to get a flavour of the language and collect the sounds of the language in doing so. However, upon reviewing the Swadesh List, I noticed that there were significant issues with the list regarding gaps: firstly, it was missing many pronouns, most notably the third person singular pronoun, any object pronouns, any possessive pronouns, and most concerningly, any feminine pronouns. Languages can differ in case (subject and object pronouns differ, such as “I” and “me”) and gender (“He” and “she”), so why would a list designed to get a basic understanding of a language miss these out? It additionally missed out auxiliary verbs which, as seen in the last blogpost, actually resulted in some of the most interesting findings when they were included in my updated list. It also didn’t take into account the different environment and cultures of different language speakers; the words chosen were all very generic and didn’t include local food, flora, fauna, or cultural items.

I’m not the first to notice problems with this list. Researchers have previously found that 207 words isn’t enough to capture the entirety of a sound system; they recommend at least 400 words and that the list is filled with the most common words found in English. This covered my concerns about the pronouns and the auxiliary verbs. They also recommend taking into account local culture, flora, and fauna, which I also did to address my own concerns. These changes, as I see it, paid off massively: not only was I able to analyse pronouns fully and identify patterns in the modal verbs because of these expansions, but I was able to identify sounds that the Swadesh list could not: /phː/, /gʱ/, /ⱱ/, /ia/, and /ɲː/ to name a few. Thus, not only was I able to create a fuller picture of the language, I was also only able to accomplish the original task of finding out the sound system of Judeo-Malayalam using my wordlist.

Final Thoughts

Well, there you have it. The sounds of Judeo-Malayalam documented, its interesting linguistic phenomena analysed, and a little “behind-the-scenes” look at language documentation in 2021. This is still just the start of work documenting Judeo-Malayalam: oral histories are being recorded and more linguistic data could be collected and analysed from Judeo-Malayalam speakers to flesh out the findings here. Maybe there are more phonemes; maybe some of them are specific to Thapan’s accent; maybe there is a significant difference between younger and older speakers. These are all things we can investigate, and there is so much more to learn. For more information on Judeo-Malayalam, Dr Ophira Gamliel has produced a work on the language and an excellent list of resources can be found at the Jewish Languages site.

--

--