Behind the Hype: Models based on T5 (2019) Still Better than Vicuna, Alpaca, MPT, and Dolly

A new study shows that there hasn’t been much progress behind the recent surge of chat models.

Benjamin Marie
2 min readJun 14, 2023

A research team from Alibaba and Singapore University has recently released a new leaderboard for instruction-tuned large language models (LLMs):

All the chat models recently released belong to this class of models: Vicuna, Alpaca, Dolly, and ChatGPT.

The results on benchmarks for “problem solving” are very interesting:

Source: (June 14th, 2023)

ChatGPT is the best on average. But if you look at the 3rd rank, you’ll see “Flan-T5”. A base model (T5) that was released in 2019 and fine-tuned with instructions to become Flan-T5.

Flan-T5 outperforms all the LLaMa and OPT-based models which are billion-parameters bigger.

This is the first time we see this because chat models that are recently published are usually only compared to other recent ones, e.g., Vicuna versus Alpaca.



Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on