How to fine-tune an open-source LLaMa using QLoRa

Anchen
3 min readJun 1, 2023

As a developer who often turns to ChatGPT as my trusty pair-programming companion, we end up engaging in many detailed discussions on coding and logical reasoning. Open source LLaMa, fine-tuned on general conversations, sometimes seems to lack in these areas. This got me thinking: could using our previous conversations to fine-tune the open source LLaMa enhance its capacity?

Here’s what I did: I first downloaded all my past conversations with ChatGPT using a handy Chrome extension that I’ve created. Check it out on GitHub: https://github.com/mzbac/chatgpt-backup-extension

With the conversation backup in hand, you can use the provided [convert script] to morph your conversation into an array of questions and answers suitable for the QLoRa fine-tuning dataset. Bear in mind, as QLoRa only supports structure fine-tuning, we need to prepare the dataset in a specific format.

When fine-tuning a model, you typically want a model trained on a dataset that resembles yours. For my situation, I had the choice between the 13B wizardLM or vicuna from open source LLaMa. I tested both and concluded that wizardLM exhibits superior reasoning capability. Hence, I’ll be using it as my foundational model for fine-tuning on my dataset.

We’ve selected our base model and have our dataset ready. Now, all we need is to clone the repository with all the necessary scripts bundled in one convenient location to kickstart our fine-tuning. Here’s the link to the repository:https://github.com/mzbac/qlora-fine-tune

Keep in mind, if you’re using a different dataset or base model, you’ll have to modify the code to accommodate your base model’s prompt format or your dataset format. The relevant code can be found here https://github.com/mzbac/qlora-fine-tune/blob/main/qlora.py#L521-L527.

After you’ve installed all dependencies as per the readme, you can begin fine-tuning the model in QLoRa by running the command mentioned below:

python qlora.py --model_name_or_path TheBloke/wizardLM-13B-1.0-fp16 --dataset my-data --bf16

Don’t forget to tweak the base model name as per your chosen model.

Once the fine-tuning is complete, you can measure the model’s performance using the script provided here. If you’re content with the outcome, you can merge it back to the base model using the script provided.

python merge_peft_adapters.py --device cpu --base_model_name_or_path TheBloke/wizardLM-13B-1.0-fp16 --peft_model_path ./output/checkpoint-2250/adapter_model --output_dir ./merged_models/

Remember to adjust the base model name and your lora adapters_name as needed.

With the merged model ready, it’s common to perform 4-bit quantization for efficient execution on consumer-grade hardware. Use the following command to quantize your fine-tuned model:

python llama.py ${MODEL_DIR} c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors {your-model-name}-no-act-order-4bit-128g.safetensors

Note: If you’re dealing with a 33B model, you may want to add the ‘ — act-order’ flag to limit the VRAM usage.

With your fine-tuned GPTQ model in hand, you can host your model as an API using the repository available here https://github.com/mzbac/GPTQ-for-LLaMa-API.

After testing the fine-tuned model with my dataset, I noted a significant improvement in the logic and reasoning department. For instance, the fine-tuned model can now accurately respond to questions that initially puzzled the base model.

Let’s take a look at the responses from the base model:

### Instruction: If I hold three coins in my hand and drop two coins and then pick up one coin from the ground. How many coins do I have now?

### Response: You started with 3 coins, dropped 2 coins, so you are left with 1 coin.

### Instruction: The mom of Michael has four sons. Three of their names are Jon, Jo, Jay, What is the fourth child called?

### Response: The fourth child's name is not given in the problem statement.

Now, check out the responses from the fine-tuned model:

### Instruction: If I hold three coins in my hand and drop two coins and then pick up one coin from the ground. How many coins do I have now?

### Response: You would have 2 coins left in your hand after dropping two and picking up one.

### Instruction: The mom of Michael has four sons. Three of their names are Jon, Jo, Jay, What is the fourth child called?

### Response: The fourth child's name is Michael.

If you have a suitable dataset and are keen on crafting a custom LM model to assist you in your particular domain, I recommend trying QLoRa. You may be pleasantly surprised by the outcome.

--

--