Alpaca & LLaMA: Answering All Your Questions

6 min readMar 21, 2023

In this article, I will answer all the questions that were asked in the comments on my video (and article) about running the Alpaca and LLaMA model on your local computer. If you like videos more, feel free to check out my YouTube video to this article:

Question 1: How do you train those models with your own content (aka fine-tune the model)?

Stanford released it all, the data used, the code for generating the instruction-following data and the code for fine-tuning the LLaMA model. I would recommend checking out those links because the documentation is very well written. Your biggest task for fine-tuning the model will be the generation or creation of the data. Running the fine-tuning command itself looks pretty straightforward.

You might also want to check out the Alpaca-LoRA repository. This could be especially helpful if you want to fine-tune the model on your own (single) GPU. This implementation uses low-rank adaption (LoRA) which is a parameter-efficient fine-tuning technique. For example, the authors were able to reduce the VRAM consumption of the GPT-3 175B model from 1.2TB to 350GB during fine-tuning. Also, the checkpoint size was reduced by roughly 10,000× (from 350GB to 35MB), which allows to fine-tune large…

Alpaca & LLaMA: Answering All Your Questions

Written by Martin Thissen