Run GPTQ, GGML, GGUF… on Apple Silicon?

A community sharing effort on the topic: GPU does not mean NVidia. Mac M1/M2 Metal support and AMD GPUs have their own game too.

Fabio Matricardi
9 min readOct 30, 2023
Image by Firmbee from Pixabay

Quantized Models are on trend. The Trend of Enabling Large Language Models on Consumer Hardware
The advent of large language models has revolutionized natural language processing and driven significant advancements in AI-powered applications. However, their resource-intensive nature has posed challenges when it comes to deployment on consumer hardware.

The biggest problem is the total hegemony of the NVidia hardware: you can have a powerful Mac Silicon or a really performing AMD graphic card, and yet many of the LLM libraries lacks of the requirements to make use of them.

In this article I will report the answers of many Medium readers demanding how can we run Quantized models on Mac Silicon computers. Here the things I learned from them and I would like to share:

The state of the art and Quantization in general
Feedback from Mac M1/M2 users (Apple Silicon)
How to install and use GGUF/GGML with llama-ccp-python
And what about CTransformers on M1/M2?
Prompt templates as a pro
Conclusions

--

--

Fabio Matricardi

passionate educator, curious industrial automation engineer. Learning Leadership and how to build my own AI. contact me at fabio.matricardi@gmail.com