Mistral 7B — the open source model !!

2 min readSep 28, 2023

French startup Mistral AI releases an open source AI model with 7.3 billion parameters called “Mistral 7B”, which would be more efficient than Meta’s Llama 2 13B AI model.
Despite its relatively small size, Mistral 7B would be much more efficient than Meta’s Llama 2 model. Meta’s model includes 13 billion parameters. Mistral 7B can be downloaded through various channels, including a 13.4 GB torrent file. The company also announced that it has launched a GitHub repository and a Discord channel for collaboration and troubleshooting.
Mistral AI wants to democratize access to AI and reduce the size of language models.

With 7 billion parameters, Mistral 7B is smaller than Llama 2 13B which includes 13 billion parameters, but would be much more efficient. Indeed, basic models – such as GPT-3 (the largest language model to date with around 175 billion parameters) and GPT-4 (OpenAI has not disclosed the number of parameters of this model) – can do much more, but are much more expensive and difficult to execute, leading them to be available only through APIs or remote access.
Mistral 7B aims to offer capabilities similar to those of the largest LLMs with a considerably lower computational cost.

Mistral 7B outperforms Llama 2 13B on all benchmarks;
Mistral 7B outperforms Llama 1 to 34B (a 34 billion parameter model) on many benchmarks;
Mistral 7B approaches the performance of CodeLlama 7B on code, while remaining efficient on English tasks;
Mistral 7B uses the GQA (Grouped-query attention) approach for faster inference;
Mistral 7B uses the SWA (Sliding Window Attention) approach to process longer sequences at a lower cost.

Mistral AI attempts to escape licensing issues with open source models.
Most importantly, the model was released under the Apache 2.0 license, a highly permissive scheme that has no restrictions on use or reproduction beyond attribution.

Mistral Transformer;

https://github.com/mistralai/mistral-src#mistral-transformer

Mistral 7B is a further refinement of other “small” large language models like Llama 2, offering similar capabilities (according to some standard benchmarks) at a considerably smaller compute cost. Foundation models like GPT-4 can do much more, but are far more expensive and difficult to run, leading them to be made available solely through APIs or remote access.

Thanks in advance 💐

Mistral 7B — the open source model !!

Written by Jacek Fleszar