Run GPTQ, GGML, GGUF… on Apple Silicon?
A community sharing effort on the topic: GPU does not mean NVidia. Mac M1/M2 Metal support and AMD GPUs have their own game too.
Quantized Models are on trend. The Trend of Enabling Large Language Models on Consumer Hardware
The advent of large language models has revolutionized natural language processing and driven significant advancements in AI-powered applications. However, their resource-intensive nature has posed challenges when it comes to deployment on consumer hardware.
The biggest problem is the total hegemony of the NVidia hardware: you can have a powerful Mac Silicon or a really performing AMD graphic card, and yet many of the LLM libraries lacks of the requirements to make use of them.
In this article I will report the answers of many Medium readers demanding how can we run Quantized models on Mac Silicon computers. Here the things I learned from them and I would like to share:
The state of the art and Quantization in general
Feedback from Mac M1/M2 users (Apple Silicon)
How to install and use GGUF/GGML with llama-ccp-python
And what about CTransformers on M1/M2?
Prompt templates as a pro
Conclusions