Textual data augmentation with back-translation
You are probably familiar with numerous computer vision augmentation techniques like image random flipping, rotating, cropping. In Natural Language Processing domain data augmentation is a much more challenging task!
Let’s get familiar with one of the powerful text augmentation techniques: Back Translation. In this approach, one makes use of machine translation to paraphrase a text while retaining the meaning.
The back-translation process is following:
- Take some sentence and translate to another language
- Translate the output sentence back to original language
- Check if the new sentence is different from the original sentence. If it is, then we use this new sentence as an augmented version of the original text.
In case the sentence is still the same, you can take advantage of several intermediate languages, for example:
1. English: Riga is a beautiful city near the Baltic Sea
2. Latvian: Rīga ir skaista pilsēta netālu no Baltijas jūras
3. Russian: Рига — красивый город на берегу Балтийского моря
4. Back to English: Riga is a beautiful city on the shores of the Baltic Sea
Having this process automated for your training set you might achieve much better performance of your NLP model!