Local LLM eval tokens/sec comparison between llama.cpp and llamafile on Raspberry Pi 5 8GB model

Jason TC Chuang
aidatatools
Published in
3 min readApr 15, 2024

Results first:

With the newest Raspberry Pi OS released on 2024–03–15, LLMs run much faster than Ubuntu 23.10. It’s tested on llama.cpp and llamafile.

On the same Raspberry Pi OS, llamafile (5.75 tokens/sec) runs slightly faster than llama.cpp (4.77 tokens/sec) on TinyLLamaQ8_0.gguf model.

OS preparation

For Ubuntu 23.10 via Raspberry Pi Imager, here is what I chose.

Image 1: Ubuntu 23.10 OS

For Raspberry Pi OS, here is what I chose.

Image 2: Raspberry Pi OS Full

Running LLMs should stop screen recording, because it drains some hardware resources. Just take a screenshot after you see the throughput results (eval tokens/sec). It will make the number more beautiful.

llama.cpp

Model file TinyLlama-GGUF Q8_0 (move it inside models folder)

The execution command is like this.

make -j && ./main -m models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
Image 3: eval rate for llama.cpp is 4.77 tokens/sec

llamafile

LLamafile TinyLlama-GGUF Q8_0 GGUF (Just download it)

The execution command is like this.

chmod u+x TinyLlama-1.1B-Chat-v1.0.Q8_0.llamafile
./TinyLlama-1.1B-Chat-v1.0.Q8_0.llamafile --temp 0.7 -p 'Building a website can be done in 10 simple steps:\nStep 1:'
Image 4: The eval rate for llamafile is 5.75 tokens/sec

The neofetch output of Raspberry Pi OS

Image 5: neofetch output for Rasberry Pi OS

The throughput eval rate (tokens/sec) is around 1~1.5 tokens/sec on Ubuntu 23.10. It’s because when running LLM, it’s also recording the screen.

Image 6: neofetch shows wrong CPU: BCM2835. Actually, for RPI5, it should be BCM2712.

Conclusion

Llamafile with the suitable OS support, it can run slightly faster than llama.cpp. With recent default support of Vulkan GPU on Raspberry Pi OS, https://www.phoronix.com/news/Raspberry-Pi-OS-Default-V3DV Hopefully, the community can leverage the GPU on Raspberry Pi 5 to run even faster. Let’s wait and watch the news.

--

--

Jason TC Chuang
Jason TC Chuang

Written by Jason TC Chuang

Google Certified Professional Data Engineer. He holds a PhD from Purdue University. He loves solving real-world problems and building better tools with ML/AI.