Zero-Shot Learning: Real-Time Language Translator

This article explores the application of zero-shot learning techniques in achieving real-time language translation.

Pradum Shukla
Accredian
5 min readSep 25, 2024

--

Image by Author

Introduction

Language barriers remain a challenge in our interconnected world. Traditional translation systems rely on large datasets, which isn’t feasible for rare languages. Zero-Shot Learning (ZSL) addresses this by enabling translation without direct training data. In this article, we’ll explore how ZSL works for real-time language translation, recent advancements, and provide a simple implementation example.

What is Zero-Shot Learning?

Zero-Shot Learning (ZSL) allows models to handle tasks or classes they haven’t seen before. Unlike traditional models needing large labeled datasets, ZSL predicts new categories without specific training data, making it ideal for translating language pairs with little or no data.

Image by author

How ZSL Works for Translation?

In real-time translation, Zero-Shot Learning (ZSL) enables a model to translate between languages without direct training on specific language pairs. Instead, it uses multilingual embeddings that represent words from various languages in a shared space.

Key Components of ZSL for Translation:

Multilingual Training Data: The model is trained on sentence pairs (e.g., English-French, English-Spanish) to understand sentence structures across languages.

Image by author

Shared Vector Space: Words and phrases from different languages are mapped into a shared space, allowing the model to recognize similar meanings. For example, it can translate between Italian and Hindi, even without seeing this pair before.

Pivot Language: The model may use a common language like English as a bridge. For instance, it translates Japanese to English, then English to Swahili, without needing direct Japanese-Swahili data.

Transfer Learning: Knowledge from high-resource languages (like English) is transferred to low-resource languages (like Somali or Uzbek).

Neural Machine Translation (NMT): ZSL enhances NMT by allowing translation for unseen languages, often using models like BERT or mBART.

Image by author

How is Zero-Shot Translation Achieved?

Zero-shot translation is achieved by training a model on multiple languages at once, allowing it to understand their structure without needing to see direct language pairs. For example, if a model learns English-French and English-German, it can translate French-German without seeing those pairs during training.

Step-by-Step Process:

Pre-training: The model is trained on sentence pairs from various languages (e.g., English-Spanish, English-Chinese). This helps it understand relationships between words across languages.

Embedding Creation: Sentences from different languages are mapped into a shared embedding space, so similar sentences (in any language) have similar representations. For instance, “Hello” in English and “Bonjour” in French would have similar embeddings.

Image by Author

Inference on Unseen Pairs: The model can then translate new pairs like Japanese to Swahili using its shared knowledge from other languages (like English or Chinese) without direct training on that pair.

We will demonstrate zero-shot learning by translating an English sentence to Arabic using the mBART model, which facilitates seamless many-to-many machine translation.

from transformers import MBartForConditionalGeneration, MBart50Tokenizer
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50Tokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

We import the MBartForConditionalGeneration class and the MBart50Tokenizer from the transformers library. These classes are necessary for loading the pre-trained model and handling tokenization.

We load the pre-trained mBART model and its corresponding tokenizer. The model is capable of handling many languages and translating between them.

source_lang = 'en_XX'  # English
target_lang = 'ar_AR' # Arabic

We specify the source language as English (en_XX) and the target language as Arabic (ar_AR). These language codes are recognized by the mBART model.

input_text = "What is your name?"  # Example sentence
tokenizer.src_lang = source_lang
inputs = tokenizer(input_text, return_tensors="pt")

We assign the source language to the tokenizer and then tokenize the input text. The return_tensors="pt" argument converts the tokens into PyTorch tensors, which are required for model input.

generated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id[target_lang])
translated_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print(f"Translated Text (English to Arabic): {translated_text}")

The generated tokens are then decoded back into a human-readable string. The skip_special_tokens=True argument ensures that any special tokens used in the model are excluded from the output.

Finally, we print the translated text.

Expected Output:

Translated Text (English to Arabic): ما اسمك؟

This output demonstrates the model’s ability to translate without having been specifically trained on the English-Arabic language pair, showcasing the effectiveness of zero-shot learning in real-time language translation.

Key Advances in Zero-Shot Language Translation

Zero-shot language translation has rapidly progressed with new neural architectures and multilingual datasets. Some key advancements include:

mBART: A sequence-to-sequence model that handles zero-shot translation without fine-tuning on specific language pairs. For example, mBART can translate from Japanese to Swahili without prior training on this pair.

Google’s MNMT: This model uses English as a pivot to translate between over 100 languages, showing strong performance even for low-resource languages like Basque or Zulu.

T5 and mT5: Leveraging transfer learning, these models can perform zero-shot translations for languages such as Swahili and Yoruba.

GPT Models: Though primarily used for text generation, OpenAI’s GPT models can be adapted for zero-shot translation by fine-tuning with multilingual data.

Image by Author

Conclusion

Zero-Shot Learning represents a significant leap forward in real-time language translation, enabling machine learning models to overcome the limitations of traditional methods by translating between languages without direct training data. By leveraging shared embeddings and advanced neural models, ZSL opens the door to translating low-resource languages and scaling language support globally. As advancements in neural architectures continue, ZSL’s potential to enhance global communication and bridge linguistic divides will only grow, driving more inclusive and accessible technology solutions.

Future Example

Imagine a future where travelers can wear smart glasses powered by ZSL, allowing real-time translation of any spoken language they encounter, from remote indigenous dialects to official global languages. This technology could facilitate meaningful interactions without language barriers, whether in local markets, international conferences, or emergency situations, making the world a more connected and inclusive place.

Image by Author

--

--