Low Resource Languages for Machine Translation — A Governmental Problem…

John Ortega
2 min readNov 18, 2019

--

When animals go extinct, do we worry about their extinction? Are we as human beings even worried about our own extinction? Some can easily point the finger at others; but, we are all guilty in some way or other for not doing something to save the human race from climate change and global warming. I’m not sure if this problem is due to our lack of caring or interest; yet, I am sure that we are not doing enough to save our culture as human beings.

Case in point, there exist languages that are spoken by millions of people where there are no digital resources available for translation to another, more widely-adapted, language like English, Spanish, or Mandarin. In some, rare cases, like Navajo, the native American tribe, there are mostly spoken remnants of what can be considered a vastly rich history. In my opinion, languages that have nearly no printed material can be considered extinct. So, why do they become extinct? Is there a valid reason for us not to preserve languages? Or, is there something we can do to make sure that human culture is marked in history despite the local history’s poverty rating?

I’m sure I will be corrected on this one; but, in the USA, there is one major government agency dedicated to low-resource language projects — DARPA. DARPA is a defense agency like many other agencies there to protect the USA from sometimes non-existent dangers that are considered high priority. Thus, the act of finding a rare, low-resource, language and attempting to document a low-resource language like Quechua, a low-resource language spoken by millions of Andean habitants, is not fathomable. That leaves the preservation of a language to humanity and we know where that leads. Unfortunately, there is not a main agency dedicated to save the world’s languages at the same level of say a Paris convention (which the USA recently declined) or the act of saving an extinct bird like say the Andean condor. Therefore, at this point in time, it can be considered a nearly lost cause.

Either way, it’s something that has always been on my mind. And, I’m just recently gathering my thoughts to speak on the topic. I wrote a paper that was accepted in the low-resource workshop last year here:

Yet, I feel like that is not enough. I don’t know what more I can do to help push this forward because at this point in time even academia has somewhat abandoned Quechua language preservation. Any ideas would be greatly appreciated…

--

--

John Ortega

John is an advanced Natural Language Processing research scientist who has about 15 years of experience in software development ,business, and academics.