How to install Llama2 on a Mac M1 & M2(Mac-Silicon)?

Mohammad M. Movahedi
4 min readSep 8, 2023

--

An important point to consider regarding Llama2 and Mac silicon is that it’s not generally compatible with it. However, there is an open-source C++ version (Llama.cpp) available, which can help us run Llama2 on both Mac M1 and M2 architectures. I’ve tried to explain in the simplest way possible how to install and run Llama2 on your personal computer in 5 steps, making it possible to run Llama2 on CPU.

Remarkable Llama 2

Okay then let’s start our project…

Step 1:

Create a new folder on your desktop specifically for this project. I will name my folder “llama2”. Then clone the Llama2 repository in this folder on your Mac by simply opening your command-line and adding the following codes:

first type

cd 

Next, simply drag and drop your folder onto the command line, and then press the ‘Enter’.

Now we clone the llama from github by simply adding the following code into the command-line:

git clone https://github.com/facebookresearch/llama.git

Now, we’re going to clone the C++ version called Llama.cpp. This version makes it super easy to run Llama2 on Mac Silicon devices. You can achieve this by adding the following code to the command line:

git clone https://github.com/ggerganov/llama.cpp.git

Yes, it’s that simple! You’ve just completed step 1 for Llama2 on your Silicon Mac. Now, go ahead and move on to step 2.

Step 2:

To get started, you’ll need to download the Llama2 models as follow:

To request access to Meta, please visit the following link. Make sure to choose Llama2 and Llama Chat version. Once you’ve agreed to the terms and conditions, Meta will send you an email (which may take a little while) containing instructions on how to download the Llama2 versions:

First install wget and md5sum with homebrew in your command line and then run the download.sh directory simply by adding this code again in the command line:

bash download.sh

Add the URL link you received in your email in to the command-line. Please note that the download link in the email will be valid for 24 hours. If you don’t complete the download within that time, you’ll need to submit a new request to Meta. We recommend not downloading all versions; instead, focus on getting the Llama2–7B and Llama-7B-Chat versions.

Create a new folder within your primary Llama2 directory, which you’ve previously set up, and name it “meta_models.” Move all the downloaded Llama2 models into this folder. If you decide to use a different name for the folder, be sure to adjust the code accordingly in the following steps.

You’ve just completed step 2 for Llama2 on your Silicon Mac. Now, go ahead and move on to step 3.

Step 3:

Through the cmd we open the Llama.cpp directory

 cd llama.cpp

Now we build the Llama.cpp by adding

make

Make sure you have Python3 installed on your device.

Next, we’ll create a new Python environment and give it the name ‘Llama2’:

conda create –name llama2

And then we activate it with:

conda activate llama2

And then we install the new version of python on it:

conda install python=3.11

Now we install the requirements.txt file in the Llama.cpp directory by adding the following code in the Command-line:

python3  -m pip install -r requirements.txt

Yesss!, You’ve completed the step 3 for Llama2 on your Silicon Mac. Now, go ahead and start with step 4.

Step 4:

In the llama.cpp folder, find and open the “models” folder. Inside “models,” create a new folder called “7B.” Afterward, return to the command line and enter the following code:

python3 convert.py  --outfile models/7B/ggml-model-f16.bin  --outtype f16 ../../llama2/meta_models/llama-2-7b-chat

If you have followed the steps correctly, you should see a new file in your “7B” folder named “ggml-model-f16.bin” If not, please double-check your previous steps. If you encounter any issues, feel free to contact me, and we can troubleshoot the problem together.

Step 5:

Next, let’s quantize the current model to ensure compatibility with the CPU. While there are pre-prepared versions of the quantized model available, we’ll take the direct approach through the command line for added control. To begin, simply enter the following code into the command line:

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

Once the quantization process is complete, you will find a smaller model in the “7B” folder. It’s only 3.83 GB in size and is named “ggml-model-q4_0.bin.” This is almost 10 GB smaller than the previous version.

Let’s prepare now this model for a chat in our command line. First simply enter the following code into the command line:

./main -m ./models/7B/ggml-model-q4_0.bin -n 1024

Now, let’s activate the model using one of the prepared prompts in our llama.cpp directory. Just enter the following code into the command line:

./main  -m ./models/7B/ggml-model-q4_0.bin  -n 1024  --repeat_penalty 1.0  --color  -i  -r "User:" -f ./prompts/chat-with-bob.txt

And now, you’re ready to have a good old chat with your very own AI buddy, affectionately named Bob. It’s like having a digital pal in your pocket! 😄

--

--

Mohammad M. Movahedi
Mohammad M. Movahedi

Written by Mohammad M. Movahedi

A passionate physicist with a deep interest in cutting-edge technologies like AI and Quantum technology.