Science Left Behind

16 min readJul 27, 2022

30 tomatoes — 25 bananas = 5 bananas????

Beginning of July, Meta AI published an outstanding work: No Language Left Behind (NLLB).

Update (v2): The v2 of the NLLB paper is out on ArxiV. Meta AI’s reaction to my review was very quick and cordial (despite the salt and errors in my review…). Comparisons spBLEU-BLEU and chrf-chrf++ have been replaced. Now, for most tables in the automatic evaluation section, NLLB compares its own scores with the scores copied from previous work. I believe that they did their best to make these scores as comparable as possible. They follow the machine translation evaluation standard… and thus I still strongly disagree that all these scores are comparable, but I won’t argue more about this particular paper. I will write another, more general, article on why we should stop comparing copied results. Unfortunately, I have many compelling examples from the scientific literature…
The following review has been written given the v1 of the NLLB paper. Some of my comments don’t apply anymore to the subsequent versions of the paper, but I think it is still worth reading if you are interested in better understanding, or discovering, very common pitfalls in machine translation evaluation.

NLLB presents a new translation model and datasets for 200 languages. This is a wonderful initiative that will definitely benefit many on the planet.

Science Left Behind

Written by Benjamin Marie