Mistral 7B Beats Llama v2 13B on All Benchmarks: Overview and Fine-tuning

Datadrifters
5 min readOct 4, 2023

A few days ago, the Mistral AI team released Mistral 7B, which beats Llama 2 13B on all benchmarks and Llama 1 34B on many benchmarks, and is almost on par with CodeLlama 7B.

“Community-backed model development is the surest path to fight censorship and bias in a technology shaping our future” — Mistral AI

Performance of Mistral 7B and different Llama models on a wide range of benchmarks. For all metrics, all models were re-evaluated with our evaluation pipeline for accurate comparison. Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B (since Llama 2 34B was not released, we report results on Llama 34B). It is also vastly superior in code and reasoning benchmarks. Reference: https://mistral.ai/news/announcing-mistral-7b/

Looking good!

If you are running macOS or Linux, you can try it locally using Ollama. It should give you 25 tokens/second on a low-end Macbook based on reports, which is already reasonable - we want to hear what performance you get out of your system, and we’d appreciate it if you could drop that in the comments.

Let’s see how Mistral AI team created this beast.

There are 3 techniques that Mistral 7B leveraged to offer faster inference and the ability to handle longer sequences with minimal computational overhead.

1. Grouped-query attention (GQA)

Attention mechanisms, the cornerstone of transformer architectures, have traditionally been computationally intensive, especially when…

--

--