In this article I will show you how you can run state-of-the-art large language models on your local computer. Yes, you’ve heard right.
For this we will use the dalai library which allows us to run the foundational language model LLaMA as well as the instruction-following Alpaca model. While the LLaMA model is a foundational (or broad) language model that is able to predict the next token (word) based on a given input sequence (sentence), the Alpaca model is a fine-tuned version of the LLaMA model capable of following instructions (which you can think of as ChatGPT behaviour). What’s even more impressive, both these models achieve comparable results or even outperform their GPT counterparts while still being small enough to run on your local computer. In this video I will show you that it only takes a few steps (thanks to the dalai library) to run “ChatGPT” on your local computer.
If you like videos more, feel free to check out my YouTube video to this article:
The LLaMa model is a foundational language model. While language models are probability distributions over sequences of words or tokens, it is easier to think of them as being next token predictors. So based on a given sequence of words a language model would predict the most plausible next word. I’m sure you’ve seen this behaviour before where you start a sentence and ChatGPT, for example, continues your sentence.
What makes the LLaMA model special? Well, while being 13x smaller than the GPT-3 model, the LLaMA model is still able to outperform the GPT-3 model on most benchmarks. And we all know how good the GPT-3 or ChatGPT models are. This is truely impressive and also the reason why we can run a ChatGPT-like model on our local computer. One guy was even able to run the LLaMA model on his Raspberry Pi, that’s insane.
Originally, the LLaMA model was intended to be used for research purposes only, and model checkpoints were to be requested from Meta. But…