Language gap, The overlooked shadow of the AI era

itcontentsfactory
3 min readAug 6, 2024

--

While considerable effort has been devoted to examining the disruptions that have accompanied the explosive advances in AI, one important factor has been overlooked: the current dominant AI models are limited to “English”. Related global policy discussions have ignored non-English languages.

Stanford researchers put the leading generative AIs to a language test and found that they incorrectly recognized languages other than Standard American English. AI use has exploded, but mostly because it’s trained in English, leaving speakers of other languages marginalized. Experts worry that this language divide is creating an AI skills inequality, leaving many regions and cultures behind.

While leading AI models are multilingual, they support only a fraction of the world’s more than 7,000 languages. Approximately 95% of the world’s population is underserved by AI technologies. This exacerbates the existing digital divide. It could even increase economic inequality.

Most non-English languages don’t have enough digital data to learn; they’re in low-income countries or have fewer users, so there’s no incentive for industry to support them. Some languages are structured in ways that make it difficult for AI models to learn or cope. For example, low-resource languages like Kazakh are almost non-existent in the digital world, which leads to poor performance of AI models. Amharic doesn’t translate well with current AI tools.

Recently, there have been efforts to develop AI models that support more languages. Cohere’s Aya project has released multilingual AI models and data that support 101 languages. There are also open source projects that collect and share data on different languages. The AINA project at the Barcelona Supercomputing Center aims to develop AI techniques for low-resource languages like Catalan. There are also attempts to use crowdsourcing for language data collection. Lesan is working with local communities in this way to develop machine translation and speech technologies for Amharic and Tigrinya.

There are many other important roles to play in addressing the language gap, including the use of translation technology and services, cultural sensitivity, and language training programs. By taking a multi-pronged approach and using them collectively, we can ensure that more people can benefit from AI technology.

Bridging the AI language gap is not just a technology issue; it’s a social, economic, and political challenge. It requires collaboration among diverse stakeholders, increased investment in minority languages, ongoing discussions about AI ethics and fairness, and the creation of policies to preserve linguistic diversity.

We must work to ensure that the benefits of AI technologies are not limited to language speakers, but are equitably distributed to all people around the world. If AI is to be used for the prosperity of all humankind, as is the mission of big tech, then we need to ensure that the benefits of AI are shared equitably. An AI era where linguistic diversity is celebrated is the future we should be working towards.

--

--

itcontentsfactory
0 Followers

I am a 25-year veteran of the software industry and a technology columnist based in South Korea. I began my career as an IT journalist.