Llama 3: First Look

Renjith R Krishnan
4 min readApr 19, 2024

--

Llama 3 Prompting with Tom, Dick and Harry :-)

Like many, I am super excited with the much awaited announcement from Meta about its next generation open LLM — Llama 3.

IBM’s watsonx platform has already added Llama 3 to its expanding catalog of Enterprise-Ready LLMs. In this article, I intend to callout the highlights of Llama 3, based on the first day usage experience with it.

Model Variants

In this first release of Llama 3, Meta offers text-based models that come in two model sizes — 8B and 70B parameters. In addition to the base pre-trained models, Meta has also released the instruction-fine-tuned version of these models, which can readily understand prompts to serve use cases like chat-assistants. These models are pre-trained in English language and can generate text & code out of the box.

Hugging Face Model Catalog

Model Architecture

Like its previous generation, Llama3 also uses decoder-only transformer architecture, which is mainly useful for generative tasks.

However, Llama 3 models doubled its context window to 8K tokens -certainly a big relief for Llama 2 Users! The increased context window will make Llama 3 much more effective in use cases such as code generation, code summarization and lengthy chat sessions, which were constrained by the shorter 4K context window in Llama 2.

Llama 3 also improves its model performance substantially, by switching to Tiktoken (openai/tiktoken) tokenizer, as opposed to the SentencePiece tokenizer (google/sentencepiece) used by Llama 2.

Prompt Template

Now comes the interesting part. Llama 3 uses a new Prompt Template, which takes the following format for a multiturn-conversation.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

As seen here, Llama 3 Prompt Template uses some special tokens (based on the tiktoken tokenizer).

Llama 3 Template — Special Tokens

Here is a simple example of the results of a Llama 3 Prompt in a multiturn-conversation with three roles (system, user, assistant).

Results from Llama 3 model (watsonx.ai)

Let’s now prompt with Tom, Dick and Harry :-)

I further extended the above example, with three different Users in a cross-referencing context, and Llama 3 showed its reasoning power.

Results from Llama 3 model (watsonx.ai)

Benchmarks

Meta’s benchmark of instruction-tuned model shows a huge leap, compared to the performance of its previous generation.

Meta’s benchmark of Instruction-Tuned models

Knowledge Cut-off

Llama 3 is trained on a mix of publicly available online data of the order of 15 trillion tokens of data. As per Meta, Knowledge cut-off dates for both models are as below.

Llama 3 8B model has a knowledge cut-off of March, 2023.

Llama 3 70B model has a knowledge cut-off of December, 2023.

However, in my limited testing of the two instruction-tuned models, I felt these models have limited exposure to 2023 data. The quality of knowledge seems to drop considerably for the period after Q1 2023 (even for 70B model), as shown below.

Llama 3 70B successfully responds to a Feb 2023 event question
Llama 3 70B fails to respond to a July 2023 event question

Future of Llama 3

As per Meta, this is only the beginning of Llama 3 and there is lot more to come, including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities. Looking forward to that exciting time ahead.

--

--