From Corpus to Multi-Label Classification

Translating Text with EasyNLP

Bryan
1 min readSep 17, 2023

To bring the articles to life with real-world examples, I put together a Google Colab notebook. This notebook allows for the seamless translation of a subset of Amazon reviews. What’s more, it’s designed with flexibility in mind, enabling you to adjust parameters based on your interests.

Photo by Tim Photoguy on Unsplash

Specifically, we download a subset of Amazon reviews from Hugging Face’s Datasets focusing on the “kitchen” product category and five languages: Chinese, French, Spanish, German, and Japanese. We take a random sample of 1,000 records for each language and use EasyNLP (a nifty wrapper for running pre-trained Transformers models for inference) to translate them to English. Leveraging Python’s concurrent.futures library, we parallelize the translation process, ensuring it’s swift and efficient.

After running the code, you should have CSVs with the original and translated text in your Google Drive (or elsewhere if you prefer) ready to be harnessed in upcoming notebooks and articles throughout this series.

I hope you find it useful!

Translate Amazon Reviews.ipynb

--

--