Behind the Hype: Models based on T5 (2019) Still Better than Vicuna, Alpaca, MPT, and Dolly

A new study shows that there hasn’t been much progress behind the recent surge of chat models.

Benjamin Marie
2 min readJun 14, 2023

A research team from Alibaba and Singapore University has recently released a new leaderboard for instruction-tuned large language models (LLMs):

All the chat models recently released belong to this class of models: Vicuna, Alpaca, Dolly, and ChatGPT.

The results on benchmarks for “problem solving” are very interesting:

Source: https://declare-lab.net/instruct-eval/ (June 14th, 2023)

ChatGPT is the best on average. But if you look at the 3rd rank, you’ll see “Flan-T5”. A base model (T5) that was released in 2019 and fine-tuned with instructions to become Flan-T5.

Flan-T5 outperforms all the LLaMa and OPT-based models which are billion-parameters bigger.

This is the first time we see this because chat models that are recently published are usually only compared to other recent ones, e.g., Vicuna versus Alpaca.

--

--

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/