Google Gemini “Jacked Sloth”

Use Unsloth LoRA Adapter with Ollama in 3 Steps

Use LLama.Cpp to convert Unsloth Lora Adapter to GGML(.bin) and use it in Ollama — with a single GPU

Sarin Suriyakoon
3 min readMar 29, 2024

--

Background

I recently ran my first LLM fine-tuning session with Unsloth. And their option to save only Lora Adapter is awesome. I researched a little more and found that it could be used in Ollama directly with ADAPTER instruction.

Overview

  • Download LORA Adapter
  • Convert adapter_config.json to ggml-adapter-model.bin with llama.cpp
  • Add ADAPTER instruction into Ollama Modelfile
  • Usage
  • From Unsloth get_chat_template to Ollama Template
  • Conclusion

Download the Model LORA Adapter from Huggingface

Setup huggingface-cli first, follow this guide.

Then download the model. In this example, you can use my pacozaa/tinyllama-alpaca-lora

huggingface-cli download pacozaa/tinyllama-alpaca-lora

After running the script, you should see where your model is saved, please note it down.

Convert with llama.cpp — convert-lora-to-ggml.py

Run llama.cpp convert script

git clone https://github.com/ggerganov/llama.cpp/tree/master
cd llama.cpp
pip install -r reqiurement.txt
python convert-lora-to-ggml.py [path to LORA adapter folder]/adapter_config.json

This script will print out your .bin location, please note it down.

Add ADAPTER instruction

Run the following commands

ollama pull tinyllama
touch ModelfileTinyllama

Add the content to the ModelfileTinyllama file as below

**NOTE: Ollama usually use the Chat Fine-Tuned model, so I need to revise a base model to a pre-trained one. In this case, tinyllama is already fine-tuned for chat but we override the template to instruction format as we fine-tune it.

FROM tinyllama:latest
ADAPTER ./ggml-adapter-model.bin
TEMPLATE """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.


{{ if .System }}### Instruction:
{{ .System }}{{ end }}

{{ if .Prompt }}### Input:
{{ .Prompt }}{{ end }}

### Response:
"""
SYSTEM """Continue the fibonnaci sequence."""
PARAMETER stop "### Response:"
PARAMETER stop "### Instruction:"
PARAMETER stop "### Input:"
PARAMETER stop "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request."
PARAMETER num_predict 200

Now create and run the model

ollama create tinyadap -f ./ModelfileTinyllama
ollama run tinyadap

or you can try the one I uploaded to the Ollama library

ollama run pacozaa/tinyllama-alpaca-lora

Usage

I have set up .System prompt as a prompt for ### Instruction: , and the default is Continue the fibonnaci sequence.

If you want to change it to something else, you can do it after run ollama run pacozaa/tinyllama-alpaca-lora by using /set system

For example

>>>/set system You're a kitty. Answer using kitty sounds.

From Unsloth get_chat_template to Ollama Template

Let’s take a look into more variety of Unsloth pre-made chat template

The snippets of Modelfile could look something like these…

ChatML Template

Reference: https://ollama.com/library/qwen

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

Unsloth Template

TEMPLATE """{{ .System }}
>>> User: {{ .Prompt }}
>>> Assistant:
"""
PARAMETER stop ">>> User:"
PARAMETER stop ">>> Assistant:"

Vicuna Template

Reference: https://ollama.com/library/wizard-vicuna

TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
PARAMETER stop "User:"
PARAMETER stop "Assistant:"

LLama2 Template

Reference: https://ollama.com/library/llama2

Mistral Template

Reference: https://ollama.com/library/mistral

Conclusion

Now you can train a model with Unsloth’s Google or Kaggle Notebook and run it locally with the LORA adapter which is faster than saving the entire GGUF file.

Source and Credit

--

--