Llama 2 Chat

Varun Mathur
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
5 min readAug 3, 2023

Llama 2 is the result of the expanded partnership between Meta and Microsoft, with the latter being the preferred partner for the new model. The model is available in the Azure AI model catalog, allowing developers using Microsoft Azure to integrate it into their projects and leverage cloud-native tools for content filtering and safety features. Llama 2 is also optimized to run locally on Windows, creating a less difficult workflow for developers.

According to Meta, Llama 2 has been trained on over 40% more data compared to the previous Llama version and outperforms other language models on reasoning and knowledge tests. Percy Liang, director of Stanford’s Center for Research on Foundation Models, commented “LLaMA 2 isn’t GPT-4. but for many use cases, you don’t need GPT-4.”

Meta AI pulled the curtain back on Llama 2, the latest addition to their innovative family of AI models.

Handpicked from a buffet of publicly accessible data, the training grounds of Llama 2 have been as diverse as they are extensive. Meta confidently states that this second iteration of Llama models, thanks to its diverse education, presents a substantial performance upgrade over its predecessors. In the fluid and fast-paced world of modern chatbots, this improvement is more than just a step forward — it’s a giant leap towards the future of AI interaction.

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data.

The Llama 2 release introduces a family of pretrained and fine-tuned LLMs, ranging in scale from 7B to 70B parameters (7B, 13B, 70B). The pretrained models come with significant improvements over the Llama 1 models, including being trained on 40% more tokens, having a much longer context length (4k tokens ), and using grouped-query attention for fast inference of the 70B model.

However, the most exciting part of this release is the fine-tuned models (Llama 2-Chat), which have been optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF). Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations.

Llama vs Llama 2

The star of the show, Llama 2, dons two distinct roles — Llama 2 and Llama 2-Chat. The latter is particularly optimized for engaging in two-way conversations. They are further classified into distinct versions characterized by their level of sophistication, ranging from 7 billion parameter to a whopping 70 billion parameter model. If you’re wondering about “parameters,” think of them as the facets of a model that are honed through training data, defining the prowess of the model in a specific task like generating text.

When it comes to the training of Llama 2, the model was educated on two million tokens — elements of raw text such as “fan,” “tas,” and “tic” in the word “fantastic.” This represents a significant leap from Llama’s training, which was based on 1.4 trillion tokens. As a general rule of thumb in the generative AI sphere, the more tokens, the merrier. For comparison, Google’s premium large language model (LLM), PaLM 2, was reportedly trained on 3.6 million tokens, while speculations suggest that GPT-4 was brought up on trillions of tokens.

How to Prompt Llama 2

One of the unsung advantages of open-access models is that you have full control over the system prompt in chat applications. This is essential to specify the behavior of your chat assistant –and even imbue it with some personality–, but it's unreachable in models served behind APIs.

We’re adding this section just a few days after the initial release of Llama 2, as we’ve had many questions from the community about how to prompt the models and how to change the system prompt. We hope this helps!

The prompt template for the first turn looks like this:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]

This template follows the model’s training procedure, as described in the Llama 2 paper. We can use any system_prompt we want, but it's crucial that the format matches the one used during training.

As you can see, the instructions between the special <<SYS>> tokens provide context for the model so it knows how we expect it to respond. This works because exactly the same format was used during training with a wide variety of system prompts intended for different tasks.

As the conversation progresses, all the interactions between the human and the “bot” are appended to the previous prompt, enclosed between [INST] delimiters.

Fine-tuning with PEFT

Training LLMs can be technically and computationally challenging. In this section, we look at the tools available in the Hugging Face ecosystem to efficiently train Llama 2 on simple hardware and show how to fine-tune the 7B version of Llama 2 on a single NVIDIA T4 (16GB — Google Colab).

A script to instruction-tune Llama 2 using QLoRA and the SFTTrainer from trl.

An example command for fine-tuning Llama 2 7B on the timdettmers/openassistant-guanaco can be found below. The script can merge the LoRA weights into the model weights and save them as safetensor weights by providing the merge_and_push argument. This allows us to deploy our fine-tuned model after training using text-generation-inference and inference endpoints.

First pip install trl and clone the script:

pip install trl
git clone https://github.com/lvwerra/trl

Then you can run the script:

python trl/examples/scripts/sft_trainer.py \
--model_name meta-llama/Llama-2-7b-hf \
--dataset_name timdettmers/openassistant-guanaco \
--load_in_4bit \
--use_peft \
--batch_size 4 \
--gradient_accumulation_steps 2

Llama vs ChatGPT

When it comes to LLMs capable of generating strikingly human-like text, both LLaMA and ChatGPT have emerged as game-changers. These models’ ability to weave coherent and contextually appropriate language makes them indispensable across diverse applications. Despite their shared strengths, certain key distinctions set them apart.

Presented by Meta, LlaMA (Large Language Model Meta AI) is a fresh face on the LLM stage. Its design emphasizes efficiency and minimal resource demand, making it more accessible to a broader audience. LlaMA’s standout feature is its availability under a non-commercial license, allowing researchers and organizations to easily incorporate it into their work.

Contrastingly, OpenAI’s ChatGPT holds a reputation as one of the most advanced generative AI systems in today’s world. It’s celebrated for its uncanny ability to generate natural language text that often mirrors human-authored content.

References:

--

--