Use Unsloth LoRA Adapter with Ollama in 3 Steps

Use LLama.Cpp to convert Unsloth Lora Adapter to GGML(.bin) and use it in Ollama — with a single GPU

3 min readMar 29, 2024

Background

I recently ran my first LLM fine-tuning session with Unsloth. And their option to save only Lora Adapter is awesome. I researched a little more and found that it could be used in Ollama directly with ADAPTER instruction.

Overview

Download LORA Adapter
Convert adapter_config.json to ggml-adapter-model.bin with llama.cpp
Add ADAPTER instruction into Ollama Modelfile
Usage
From Unsloth get_chat_template to Ollama Template
Conclusion

Download the Model LORA Adapter from Huggingface

Setup huggingface-cli first, follow this guide.

Then download the model. In this example, you can use my pacozaa/tinyllama-alpaca-lora

huggingface-cli download pacozaa/tinyllama-alpaca-lora

After running the script, you should see where your model is saved, please note it down.

Convert with llama.cpp — `convert-lora-to-ggml.py`

Run llama.cpp convert script

git clone https://github.com/ggerganov/llama.cpp/tree/master
cd llama.cpp
pip install -r reqiurement.txt
python convert-lora-to-ggml.py [path to LORA adapter folder]/adapter_config.json

This script will print out your .bin location, please note it down.

Add ADAPTER instruction

Run the following commands

ollama pull tinyllama
touch ModelfileTinyllama

Add the content to the ModelfileTinyllama file as below

**NOTE: Ollama usually use the Chat Fine-Tuned model, so I need to revise a base model to a pre-trained one. In this case, tinyllama is already fine-tuned for chat but we override the template to instruction format as we fine-tune it.

FROM tinyllama:latest
ADAPTER ./ggml-adapter-model.bin
TEMPLATE """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.


{{ if .System }}### Instruction:
{{ .System }}{{ end }}

{{ if .Prompt }}### Input:
{{ .Prompt }}{{ end }}

### Response:
"""
SYSTEM """Continue the fibonnaci sequence."""
PARAMETER stop "### Response:"
PARAMETER stop "### Instruction:"
PARAMETER stop "### Input:"
PARAMETER stop "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request."
PARAMETER num_predict 200

Now create and run the model

ollama create tinyadap -f ./ModelfileTinyllama
ollama run tinyadap

or you can try the one I uploaded to the Ollama library

ollama run pacozaa/tinyllama-alpaca-lora

Usage

I have set up .System prompt as a prompt for ### Instruction: , and the default is Continue the fibonnaci sequence.

If you want to change it to something else, you can do it after run ollama run pacozaa/tinyllama-alpaca-lora by using /set system

For example

>>>/set system You're a kitty. Answer using kitty sounds.

From Unsloth `get_chat_template` to Ollama Template

Let’s take a look into more variety of Unsloth pre-made chat template

The snippets of Modelfile could look something like these…

ChatML Template

Reference: https://ollama.com/library/qwen

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

Unsloth Template

TEMPLATE """{{ .System }}
>>> User: {{ .Prompt }}
>>> Assistant:
"""
PARAMETER stop ">>> User:"
PARAMETER stop ">>> Assistant:"

Vicuna Template

Reference: https://ollama.com/library/wizard-vicuna

TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
PARAMETER stop "User:"
PARAMETER stop "Assistant:"

LLama2 Template

Reference: https://ollama.com/library/llama2

Mistral Template

Reference: https://ollama.com/library/mistral

Conclusion

Now you can train a model with Unsloth’s Google or Kaggle Notebook and run it locally with the LORA adapter which is faster than saving the entire GGUF file.

Source and Credit

ollama/docs/modelfile.md at main · ollama/ollamaGet up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/docs/modelfile.md at main ·…
github.com

llama.cpp/convert-lora-to-ggml.py at master · ggerganov/llama.cppLLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
github.com