Is LLaMA 2 on the victory lap of future? 🦙

Breaking down the new, fascinating & pre-trained, Llama 2

Harshita Sharma
Accredian
5 min readJul 31, 2023

--

Introduction

The recently launched Llama 2 has taken a massive leap in the world of open source large language models and ofcourse I had to break it down so that we can fangirl over it together!

This development in the open source community is bridging the gap between them and GPT-4 performance. It’s completely open source for both research and commercial purposes. The 77 page whitepaper released by MetaAI breaks every bit of update and we’re going to deep dive into it in this article.

You can find the paper here which explains the model details, training stages and the entire data pipeline in detail.

The New and The Better

Meta AI has claimed Llama 2 can be a suitable substitute for all the closed source models like GPT-4, making it the biggest tech company supporting open source to the fullest!

The Model Structure

It comes in 2 types Llama 2 and Llama 2-Chat which ofcourse specialises in dialogue and 3 sizes on the basis of number of trained parameters, trained on 40% larger corpus than the original Llama 1 model, which will subsequently increase the size of the context window and hence the performance of the model.

Model Training

According to Meta AI it requires “far less computing power and resources to test new approaches, validate others’ work, and explore new use cases”

It has been trained on a mix of data from publicly-available sources, which does not include data from Meta’s products or services. The company adds that it has made an effort to remove data from certain sites known to contain a high volume of personal information about private individuals.

The model was trained on NVIDIA A100 GPUs and trained with the help of a newer technique called Grouped Query Attention(GQA) to improve the scalabilty of the model.

They even talked about carbon emission in their paper, acknowledging the environmental impacts that training such huge models make, which for a company this big is quite impressive

The Commercial Usage

We all know how Llama 1, being such a powerful model was leaked from Meta, and the fine tuned versions just lighting up the open source community, but it came with one of the biggest drawbacks that it could be used for research purposes but it was prohibited to be used for any commercial usage to build products or companies.

This is where comes the sweet advantage of Llama 2, that is it’s commercial viability. The only caveat being that if you have a product with more than 700M active users that’s built on this model you’ll have to take Meta’s permission for that, but let’s be honest, this is such a win-win situation!!!!!

Safety

The paper actually highlights a lot of things on the themes of safety guardrails, read teaming and evaluations of the model, making it one of the primary themes.

This is one of the main reasons of why the 34B parameter model, even though trained wasn’t publicly released as it had significantly huge safety issues.

When it comes of LLMs, safety and helpfulness actually comes with a tradeoff. If the model is rewarded more for safety in the training, it significantly reduces the helpfulness.

One of the biggest achievements of this paper is taking two different reward model approaches and training the model in 2 different ways, one for helpfulness and one for safety. The reward models have not been released yet but one can hope for the best right.

Performance

Even with all these advancements, Llama 2 has still a long way to go when it comes to the leading models of today which are of course not open source(GPT-4 we’re looking at you!)

Conclusion

With a range of 7 billion to 70 billion parameters, LLaMA 2 has been a treat to the open source community. Even though it lags behind in it’s performance and coding abilities, it’s still quite competent to the other proprietary models.

Llama 2 certainly represents a valuable addition to the field of natural language processing, and its open access and safety-conscious approach set a positive example for future research endeavors, contributing to the wider development of AI technologies that benefit society.

You can check the model out here:

Testing the model — https://www.llama2.ai/

Download Models — https://huggingface.co/models?other=l...

--

--