Member-only story
NVIDIA’s Llama 3.1 Nematron 70B: New Open-Source Model
Takes the lead over GPT-4 and 3.5 Sonnet, official documentation, and practical applications added below
NVIDIA has just released their stunning Llama 3.1 Nematron 70 billion parameters instruct model. Surprisingly, this model beats every closed-source model. Once again, open source has raced forward, despite closed-source efforts to beat them in terms of which model is currently state of the art. There’s a lot to dive into because they introduced a new technique in how they produce this model, which is worth noting.
The Llama 3.1 Nematron 70B instruct model is a leading model on the Arena hard benchmark from LM Arena AI. Essentially, Nvidia used the Llama 3.1 model as their base and then performed post-training, likely with reinforcement learning. This approach helped the model surpass state-of-the-art closed models.
For those interested in the actual benchmarks, the Llama 3.1 Nematron 70B is performing at 85%, and 57.8% on the empty bench, surpassing all prior models. This is particularly impressive as it surpasses Claude 3.5 Sonnet and GPT-4o, which is OpenAI’s latest frontier model capable of much more than just text.
What’s even more surprising is that it also surpasses the Llama 3.1 145B instruct model, which is significantly larger. The way this model was trained has led to its impressive performance, showing that fine-tuning can significantly impact results.
Nvidia introduced an advanced reward model aimed at improving the alignment of AI models with human feedback. The researchers addressed two main approaches to reward modeling: the Bradley-Terry style and the regression style. Both methods guide AI models to provide more useful and accurate responses by assigning reward scores based on their performance in following instructions.
The Bradley-Terry model compares responses to prompts to identify which is better, while the regression model predicts a numeric…