Debunking Some AI Localisation Myths

Published in

Skyscanner Experience

6 min readJul 12, 2024

Is AI taking over translation completely?

Love it or hate it: AI is undoubtedly revolutionising the field of localisation.

For centuries, linguists have played a crucial role in bridging cultural gaps and enabling companies to expand globally. However, recent advancements in AI and ChatGPT have captured the imagination with their impressive content creation and translation capabilities. This begs the question: will machines replace human linguists in the realm of localisation? The answer: not just yet…

What’s the difference between Hamburg and a hamburger? While humans can instantly differentiate between the two, generative AI tools like ChatGPT face challenges in this regard. These tools have the potential to take over the world of content, but for professionals in localisation, there are still concerns about scaling content production across multiple languages for global strategies.

Machine Translation (MT) & Large Language Models (LLMs)

MT (machine translation) is the process of automatically translating text from one language to another using a computer application. Initially, Statistical MT systems relied on analysing bilingual text to learn the probabilities of word sequences and phrases in source and target languages. At that stage they still struggled with fluency and coherence, especially for languages with different syntactic structures. In 2013, adaptive NMTs (neural machine translation systems) were introduced and they started using so-called deep learning techniques. The main difference is that NMTs learn to translate by processing entire sentences or sequences of words, rather than individual words or phrases, resulting in more fluent and accurate translations. These models are continuously fine-tuned on vast amounts of data, achieving near-human levels of translation quality for many language pairs.

LLMs (large language model) is another type of AI program that recognises and generates text. LLMs also use deep learning to understand how characters, words, and sentences function together, and thus they are also able to translate text from one language to another using their stores of data.

Impressive, isn’t it? However, let’s take a step back and consider the potential risks of relying solely on AI for content localisation.

Data dependence

Just because ChatGPT can impress us with its human-like content, is it truly capable of mastering the nuances of all languages?

LLMs don’t actually “write” anything; they generate text based on existing data from the internet. Similarly, NMT models rely heavily on large amounts of training data. Therefore, a lack of sufficient data can lead to poor quality translations. This is a critical factor to consider when using AI for localisation. While there are languages that are very well-represented on the internet (“high resource” languages), with extensive databases and examples — such as English — let’s not forget that there are more than 7,000 languages in the world. Many minor languages haven’t yet had a significant online presence (“low resource” languages). Consequently, we still don’t have enough data and content available for these lesser-known languages. The outcome is that localising content into a low resource language (such as Catalan) will often result in low-quality output, mistranslations, and a lack of cultural context.

The image is a split composition with two distinct sections. On the left, an X post in English that reads “Lana Del Rey with a fan in France, with an automatic translation in French that instead reads “Lana Del Rey with a fan in France” where “fan” is translated as “ventilateur”, literally fan as the tool with rotating blades that creates a current of air for cooling. On the right, there is a photograph an actual fan. — “*Fan*” is translated as “*ventilateur*”, literally fan as the tool with rotating blades that creates a current of air for cooling… Correct, but not in this context

Addressing bias and ensuring accuracy

The issue of inclusive language remains a significant challenge. While many languages have gendered structures, English does not. Consequently, when translating from English sources, the translations often default to a “masculine” form, lacking the inclusive workarounds that linguists would employ. Furthermore, it is crucial to recognise that these systems rely heavily on the data they receive. If provided with false information, they will produce inaccurate and biased translations

Cultural sensitivity

And what about cultural sensitivity?

AI translation tools may not be able to accurately translate content that is culturally or politically sensitive, leading to misunderstandings, inaccuracies, and mistranslations.

How can we ensure that a machine learning model respects the values and beliefs of the target audience? In one experiment we conducted here at Skyscanner, we translated a “Things to do” article into Italian (for an Italian audience). The source text included a closing sentence like “… and in the evening, enjoy a cappuccino…“ which was translated exactly as per the source text and into Italian.

However, no Italian will ever drink a “cappuccino” after 12pm. It would sound very odd, since “cappuccino” is considered a breakfast drink only.

The image is a split composition with two distinct sections. On the left, a graphic of a cappuccino with prohibition sign, indicating no cappuccinos are allowed. On the right, there is a photograph of an individual holding up a cardboard sign that reads ‘PLEASE NO CAPPUCCINO AFTER 12 PM’

Translation is an art, not a science

Language isn’t just a way to communicate, it’s a component of culture that makes it unique and specific. Languages are always changing and evolving. They can vary across time and generations; words adopt new meaning and connotations year after year. And then there is slang, sarcasm, and intonation: qualities that might only be known through experience. Indeed, AI localisation models still struggle with handling certain linguistic phenomena, such as idiomatic expressions, ambiguous phrases, or rare language constructs.

Just like in the famous “Mary’s Room” experiment by philosopher Frank Jackson, the question for us is: “Can language and cultural perception truly be comprehended through mere physical description in the absence of conscious experience?”

True story: while running machine translation on a new article, I stumbled across this translation…

EN The jewel of the city will charm your socks off.
IT Questa città gioiello affascinerà i tuoi calzini.

Back translation: The jewel of the city will charm your (ACTUAL) socks

Ok, but… there are benefits too

Although there are still risks and doubts about the use of AI, the capabilities of these cutting-edge technologies are undeniable. They should not be feared but embraced and used to our advantage.

The future of AI translation is exciting and full of possibilities. Localisation has already been through something similar when we first introduced Translation Memories (a database that stores text that has been translated before) and CAT tools (computer-assisted translation).

It turned out these technologies were an incredible opportunity for linguists to improve efficiency and consistency as well saving companies money by reusing past translations.

MT and GenAI

At Skyscanner, we are actively exploring, learning, and experimenting with various approaches to maximise the potential of new technology. The integration of AI and machine translation into our product offers numerous advantages, even if it still requires human intervention and assessment.

Some clear benefits include improved speed and efficiency, enhanced scalability, and reduced costs compared to traditional localisation methods. AI tools can generate a substantial volume of initial content or draft translations which may not be deemed final versions. However, with human expertise and guidance through editing and sense-check reviews, the quality of the output can be swiftly elevated. These reviews also contribute to refining the machine’s capabilities for future iterations.

“The key to learning is feedback. It is nearly impossible to learn anything without it.” — Steven Levitt

When working with machine translation or large language models, some recommendations would be:

Incorporate a set of preferred terminology into your model
Established quality guidelines to help you with the editing and review stages, to define the criteria for “good” translations
Continuously fine-tune your engine by testing its performance with human input

As we said at the beginning, these systems use deep learning algorithms that analyse and understand patterns and inputs. These can improve accuracy and fluency, making the jobs of content writers or linguists faster and more effective.

In general, I believe there are a few questions you need to ask yourself before using AI:

What’s the goal of your content?
How prominent will this be?
What level of quality is expected?

From there, and considering that languages may have different quality levels, you can assess what’s the best approach, whether you should opt for an AI solution, and how much input from human linguists would be required.

Conclusion

Will AI take over completely? It doesn’t seem so, or at least not yet, but AI will definitely revolutionise our industry and we need to seize the opportunity to improve our productivity and embrace this new reality.

In the coming years, we should expect improvements in customisation, enhanced creativity, deeper collaborations with humans and more and more direct integrations with localisation workflows.

Keen to understand more about Localisation at Skyscanner? Get in touch! If you’re looking for design roles, we have a few over on our careers hub, why not check them out.