LLMs on Apple Silicon MacBooks: A Simple Guide to Running Llama2-13b with Llama.cpp

Dan Higgins
2 min readJan 5, 2024

--

Photo by Karim MANJRA on Unsplash

Hardware Used for this post
* MacBook Pro 16-Inch 2021
* Chip: Apple M1 Max
* Memory: 64 GB
* macOS: 14.0 (Sonoma)

Note: Navigating through online code samples can be incredibly frustrating when you invest time in downloading and compiling code only to discover that it doesn’t function properly on your hardware!

For those with less powerful hardware, I advise caution. However, if your hardware matches or surpasses these specifications, let’s jump into it!

After hardware, the next requirement is Xcode Command Line Tools for compiling. If you haven’t set it up yet, this medium article provides detailed instructions.

  1. Setting up the Environment:
    Create a dedicated directory for your LLM activities. For example:
mkdir -p ~/Code/LLM
cd ~/Code/LLM

2. Acquiring llama.cpp Codebase:
— a. Download the specific code/tag to maintain reproducibility with this post.
— b. Clone the rapidly evolving repository to get the latest updates.

# Download specific code/tag
wget https://github.com/ggerganov/llama.cpp/archive/refs/tags/b1770.zip
unzip b1770.zip
mv llama.cpp-b1770 llama.cpp


# OR Clone the repository
# git clone https://github.com/ggerganov/llama.cpp

3. Compilation and Configuration:
— a.
Enable Apple Silicon GPU by setting LLAMA_METAL=1 and initiating compilation with make .

cd ~/Code/LLM/llama.cpp
LLAMA_METAL=1 make

4. Selecting a Model:
— a.
Familiarize yourself with GGUF, the Language Model Format used with llama.cpp, by referring to this medium article.
b. Numerous GGUF models are available on Huggingface
c. We select llama-2-13b-chat.Q5_K_M.gguf

mkdir -p ~/Code/LLM/models
cd ~/Code/LLM/models
wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/blob/main/llama-2-13b-chat.Q5_K_M.gguf
cd ~/Code/LLM/llama.cpp

5. Running Llama2 13b:
— a.
run ./main --help to get details on all the possible options for running your model
b. copy the below code into a file run_llama.sh
c. chmod +x ./run_llama.sh
d. ./run_llama.sh

#!/bin/bash
./main --instruct \
--color \
--threads 8 \
--file ./prompts/alpaca.txt \
--n-gpu-layers 1 \
-model ../models/llama-2-13b-chat.Q5_K_M.gguf \
--ctx-size 2048 \
--temp 0.7 \
--repeat_penalty 1.1 \
--seed 789 \
--n-predict -1

6. Have fun exploring this LLM on your Mac!!

--

--