Build and run llama2 LLM locally
P/S: These instructions are tailored for macOS and have been tested on a Mac with an M1 chip.
In this guide, we’ll walk through the step-by-step process of running the llama2 language model (LLM) locally on your machine.
llama2 models are a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, Llama 2-Chat, are optimized for dialogue use cases.
This guide will cover the installation process and the necessary steps to set up and run the model. Please note that the instructions provided have been tested on a Mac with an M1 chip.
Prerequisites
Before we begin, make sure you have the following prerequisites installed on your system:
1. Python: You’ll need Python 3.8 or higher. You can check your Python version by running the following command in your terminal:
python3 --version
Python 3.11 is recommended which can be installed using the below command -
brew install python@3.11
2. Git: Ensure you have Git installed. If not, you can install it using a package manager like Homebrew:
brew install git
Cloning the llama2 Repository
1. Open your terminal
2. Navigate to the directory where you want to clone the llama2 repository.
Let's call this directory llama2
3. Clone the llama2 repository using the following command:
git clone https://github.com/facebookresearch/llama.git
4. Clone the llama C++ port repository
git clone https://github.com/ggerganov/llama.cpp.git
Now, you should have both the repositories in your llama2
directory.
5. Navigate to inside the llama.cpp
repository and build it by running the make
command in that directory.
✗ cd llama.cpp
✗ make
Requesting access to Llama Models
1. Go to the link https://ai.meta.com/resources/models-and-libraries/llama-downloads/
2. Enter your details in the form as below
3. You’ll receive an email like below with a unique custom URL to download the models
4. Navigate to the llama
repository in the terminal
cd llama
5. Run the download.sh
script to download the models using your custom URL
/bin/bash ./download.sh
6. It will prompt you to enter the download URL, enter the custom URL received in email and then select the models you want to download. e.g. if you choose the 7B-chat model, it will get downloaded and be present at ./llama2/llama/llama-2–7b-chat
Converting the downloaded model(s)
1. Navigate to inside the llama.cpp
repository
cd llama.cpp
2. Create a python virtual environment for llama2 using the command below, I'd chosen the name llama2
for the virtual environment.
python3.11 -m venv llama2
3. Activate the virtual environment
source llama2/bin/activate
The activated virtual environment will appear at the beginning in the command line inside parenthesis.
4. Install all required python dependencies. They are present in the requirements.txt
python3 -m pip install -r requirements.txt
5. Run the convert command while still in the llama.cpp
directory to convert the model to f16 format
python3 convert.py --outfile models/7B/ggml-model-f16.bin --outtype f16 ../../llama2/llama/llama-2-7b-chat --vocab-dir ../../llama2/llama
--outfile
is for specifying the output file name (Don't forget to create the 7B
folder inside ./llama2/llama.cpp/models
directory).
--outtype
is for specifying the output type which is f16
then the downloaded model is specified
--vocab-dir
is for specifying the directory containing tokenizer.model
file
It will create a file ggml-model-f16.bin
which is of the size 13.5 GB.
6. Next quantize the model to reduce its size
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
It will create a quantized model ggml-model-q4_0.bin
which is of the size 3.8 GB.
7. All set! Now you can run it and try one of the prompt examples inside the .prompts
folder.
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
-m
for specifying the model file
-n
for specifying the number of tokens
--color
for specifying that input text should be formatted as colored text
-i
for specifying that the program to be run in an interactive mode
-r "User:"
: for specifying a marker to indicate user's input in the conversation. In this case, the marker used is "User:"
-f ./prompts/chat-with-bob.txt
: for specifying path to the file (chat-with-bob.txt
) containing prompts or input for the program
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —