Localization as a service: Alliance with the AI-Language Models

Published in

Globant

8 min readMay 2, 2023

Co-Authors: Parijat Sardesai, Anurag Arwalkar

Generative AI as an assistive technology has exhibited many applications in the web development realm by creating contextual content. It can help UI engineers in some critical areas of user personalization. In this article, we want to showcase how an Open AI language model can help us expedite the process of localization (l10n) or internationalization (i18n) of Web and Mobile apps.

Localization and Internationalization

As per the official definition from W3C:

Localization (l10n) is the adaptation of a product, application, or document content to meet the language, culture, and other requirements of a specific target market (a locale).
Internationalization (i18n) is designing and developing a product, application, or document content that enables easy localization for target audiences that vary in culture, region, or language.

Localization and internationalization can have endless benefits when it comes to amplifying your global presence. Despite having good rankings, visibility, and exposure, you can lose potential customers if you fail to connect and communicate with them in their specific locale. Keeping our focus on machine-based translations and their impact, to stay focused on l10n and that too on smart translations for this study.

The customary approach

The translation process for any web application (enterprise or public domain) is usually cumbersome with a lot of manual intervention involved. A typical translation process includes:

Creation of keys/value pairs(JSON Objects) in different files for each supported language by the developers. These JSON files are used to display the content in different languages on a web page. A simple example of an English JSON file is shown below:

{
 "dashboard.hours": "hour(s)",
 "dashboard.minutes": "minute(s)",
 "dashboard.seconds": "second(s)",
 "dashboard.content": "Sunset is the time of day when our sky meets the outer space solar winds"
}

Review and addition of the keys by the language translators.
Storing the key/value pairs in a repository.
And, periodic or runtime updates of any new strings added to the repository.

This amount of manual effort involved in the process invites several challenges while localizing a web application:

Limited time: Limited time is the most common challenge that engineering teams face every day. Being unaware of the technical process and expecting the localization quickly is an ask from every product owner. Additionally, if it is a technical or legal document full of jargon, or if there is a lot of dialect and colloquialism involved, it could take even longer.
Lack of technical knowledge: Translators are linguists. Although they do have good knowledge of certain subjects, they are usually not the top experts in that field. Some translations are more technical jargon or domain-specific words, which can pose a problem for translators.
Repairs and maintenance: Further release cycles also take a toll in this process owing to the manual intervention at each step.
Technical compatibility: The process of updating the repository also is susceptible to failure if a website’s design or platform changes.

Our Proposal

With the rapid evolution of generative AI tools, this process could reap many benefits. The models of OpenAI mode can reduce many manual steps in the process mentioned above. It can help us generate text which is more contextual and not a direct translation between languages.

We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API. — OpenAI

We can use one such text completion model from OpenAI known as Text-Davinci-003. This language model claims to generate content that is more accurate, natural, and contextual.

Let’s look at one suggested architecture that shows how we can leverage the tools/services of Generative AI.

For Example:

Ver condiciones(Spanish) -> G-Translate -> See Conditions
Ver condiciones(Spanish) -> Within Context of disclaimer -> Conditions apply

Quick overview of the Architecture diagram:

Source: Any platform can request a localized version of an app. eg. mobile, web, IoT devices, watch, etc.
Node Server: Middle-level layer responsible for setting up basic data manipulation according to OpenAI specifications for translations. Also responsible for retraining the model for OpenAI. The idea of introducing a node server is to help us connect to any ML translation service in the future. This will keep a single source of truth for any source irrespective of them knowing to maintain that logic of configuring different translation services.
OpenAI Engine: The translation engine responsible for doing the actual translations, also does the retraining of the model based on the feedback provided by the Middle-level layer.
Post Editing Texts: OpenAI-generated translation is provided for a further update from language experts for thorough review which later can be used for retraining the model.

Technical Implementation

Now delving into the architecture, there are two major phases:

In the first phase, we need to create a Node JS service that accepts a JSON object containing the key/value and a language key (German), into which you wish to translate the JSON.

This NodeJS wrapper handles the data manipulation and converts it into an OpenAI-supported payload format. and Once the translations are received from the underlying translation engine, serve it back to the channel making the call.

The underlying Open AI create completion API needs some additional params which need not change, like:

The model you are using, which we stated earlier is Text-Davinci-003.
The appropriate prompt for the GPT model is to set the context.
What sampling temperature to use, between 0 and 2? Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
The fundamental unit used by OpenAI GPT models to determine a text’s length is a token. They are collections of characters that sometimes, but not always, correspond to words. The token count of your prompt plus max_tokens cannot exceed the model’s context length.
You can decrease the possibility of sampling repetitive token sequences by employing the frequency_penalty and presence_penalty provided. We used the default value (0.0). The efficient way of using this can be found here.

For the first release, we are good to go right from the first phase.

In the second phase, we need to put in place the entire feedback loop for the OpenAI API. As of this writing, Open AI only allows the fine-tuning of base models such as the original Ada, Babbagem, Curie, and Davinci.

Fine-tuning of the base model

Fine-tuning is a process of tailoring a model for your specific data. In the earlier example, the original output we received from the OpenAI API by translating it into German is as follows:

{
"dashboard.hours": "Stunde(n)",
"dashboard.minutes": "Minute(n)",
"dashboard.seconds": "Sekunde(n)",
"dashboard.content": "Sonnenuntergang ist die Zeit des Tages, wenn unser Himmel auf 
die äußeren Raumsonnenwinde trifft."
}

To understand the retraining of this model, let’s assume that we want to change the dashboard.content translation as follows:

from: Sonnenuntergang ist die Zeit des Tages, wenn unser Himmel auf die äußeren Raumsonnenwinde trifft
to: Sonnenuntergang ist die Tageszeit, zu der unser Himmel auf die Sonnenwinde des Weltraums trifft.

Now we would need to retrain the base model to start returning the string in our desired format.

Create a sample file: Example of the prompt and completion text needed for retraining using the file upload API:

{
   "prompt": "Translate the text into German: Sunset is the time of day when our sky meets the outer space solar winds",
   "completion": "Sonnenuntergang ist die Tageszeit, zu der unser Himmel auf die Sonnenwinde des Weltraums trifft"
},
{
   "prompt": "",
   "completion": "",
},

Generate a modified dataset: For that, we need to use the file-upload API. The output files from this API are used to upload documents that can be used with features like fine-tuning.

Fine-tune the base model: The file-idgenerated in step-2 is used as input to the fine-tune API as the training_file param, along with the model you want to train, which in our case is davinci.

Check the training status: You can use the fine-tune retrieve API to check the status of fine-tuning, from pending to successful(Succeeded). Once you have the status as succeeded, it will return a fine_tuned_model.

The new modeldavinci:ft-personal-2023–04–17–09–09–23 generated from the above exercise is a superset of the base model and can now be used to translate our next set of texts.

const data = await openai.createCompletion({
 model: "davinci:ft-personal-2023-04-17-09-09-23",
 prompt: `Translate the text into ${req.body.language}: ${text}`,
 temperature: 0.3,
 max_tokens: 100,
 top_p: 1.0,
 frequency_penalty: 0.0,
 presence_penalty: 0.0,
})

Highs and Lows

The concept of using machine-based translations to reduce the localization effort can help us overcome several shortcomings of the lengthy process we spoke about earlier:

Reduction in TTM for an MVP.
Reduction in the product owner/translator dependency.
Generic middle tier for reuse in the future.
Contextual translations for better coverage.

Currently, there are some critical implementation limitations to this approach.

Base Model Training: The first limitation is that we can only fine-tune the base model. So the tuning of a specific model like text-DaVinci-003 is something that is not supported as of now.
Maturity of Base Models: We need to rely on the base language model maturity. And models like Davinci are not mature enough for use with fewer prompts for fine-tuning
Data sets needed for fine-tuning: Even if we fine-tune the OpenAI on a huge dataset of 10 GB, that would be a mathematical drop in the bucket compared to the other 520 GB of data that already exists on pre-trained models.

Epilogue

This approach is already in use in several popular paid services like POEditor and Lokalise which have moved in the direction of using machine-based translations. For eg. PO Editor provides you with a choice selecting between Google, Microsoft, or a custom DeepL AI engine whichever you want for the auto-translation. All these engines train the model POEditor uses for better contextual information the next time.

The journey of a thousand miles begins with one step. — Lao Tzu

Being mindful of all the limitations, the prospects for utilizing these capabilities of GPT models seem bright. It serves the primary intent of creating generative AI solutions for augmenting human capabilities and reducing the mundane need of doing repetitive tasks over and over again.

In follow-up articles, we will get into the details of how we can overcome the limitations of the model training with the help of the new APIs of OpenAI.