Fine tune any LLM using your custom dataset 🤯

4 min readFeb 19, 2024

In this article I will share with you a fine tuning template using which you can train any model of your choice !!

Get ready to learn something new and great and make sure to experiment this alongside as coding cant be learnt by reading !!

I have break down the template into 7 simple steps will list them here and then we will dive into them !!

Setting up with the imports
Getting the base model and tokenizer
Quantizing the base model
Creating a prompt template
Using the PEFT technique for finetuning
Creating a PEFT model and training it
Generating outputs from the fine-tuned model

Damn Simple isn’t it ?? Fun part begins now , lets dive into the code part and understand what happens under the hood !!

Setting up with the imports

We will require : Transformers , bitsandbytes , peft , trl , dataset, torch

!pip install -q accelerate peft bitsandbytes transformers trl dataset torch

Then import these libraries to your use :

import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig, PeftModel , prepare_model_for_kbit_training , get_peft_model
from trl import SFTTrainer


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Getting the base model and tokenizer

Get your model id and import the model and build a tokenizer :

base_model_id = 'mistralai/Mixtral-8x7B-v0.1'
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16, #if your gpu supports it 
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_use_double_quant = False #this quantises the quantised weights
)

base_model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="cuda")

# Training_tokenizer (https://huggingface.co/docs/transformers/v4.37.2/en/model_doc/auto#transformers.AutoTokenizer.from_pretrained)
# https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    truncation_side = "right",
    padding_side="right",
    add_eos_token=True,
    add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token

Quantizing the base model

Your dataset should be formatted in the jsonl format :

train_ds = load_dataset("json" , data_files = 'codes.jsonl' , field = "train")
test_ds = load_dataset("json" , data_files = 'codes.jsonl' , field = "test")

base_model.gradient_checkpointing_enable() #this to checkpoint grads 
model = prepare_model_for_kbit_training(base_model) #quantising the model (due to compute limits)

Creating a prompt template

The prompt template depends on model to model and usually in the readme of the project page you can find it … additionally you can search it on google as well !!

This is from mixtral model project page

def createPrompt(example):
    bos_token = '<s>'
    system_prompt = '[INST] You are a medical coding  model and your role is to give the medical codes \n'
    input_prompt = f" {example['Input']} [/INST]"
    output_prompt = f"{example['Output']} </s>"
    
    return bos_token + system_prompt + input_prompt + output_prompt

Using the PEFT technique for finetuning

def printParameters(model):
    trainable_param = 0
    total_params = 0
    for name , param in model.named_parameters():
        total_params += param.numel()
        if param.requires_grad:
            trainable_param += param.numel()
            
            
    print(f"Total params : {total_params} , trainable_params : {trainable_param} , trainable % : {100 * trainable_param / total_params} ")

peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1, 
    bias="none",
    target_modules=[  #find the target modules that you want to 
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj",
    "lm_head",
    ],
    task_type="CAUSAL_LM"
)

Creating a PEFT model and training it

model = get_peft_model(model , peft_config)
printParameters(model)

if torch.cuda.device_count() > 1:
    model.is_parallelizable = True
    model.model_parallel = True

# https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/training_args.py#L161

# max_steps and num_train_epochs : 
# 1 epoch = [ training_examples / (no_of_gpu * batch_size_per_device) ] steps


args = TrainingArguments(
  output_dir = "LLama-2 7b",
  # num_train_epochs=1000,
  max_steps = 1000, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 4,
  warmup_steps = 0.03,
  gradient_accumulation_steps = 1,
  logging_steps=10,
  logging_strategy= "steps",
  save_strategy="steps",
  save_steps = 10,
  evaluation_strategy="steps",
  eval_steps=10, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2.5e-5,
  bf16=True, #if your gpus supports this 
  logging_nan_inf_filter = False, #this helps to see if your loss values is coming out to be nan or inf and if that is the case then you may have ran into some problem 
  # lr_scheduler_type='constant',
  save_safetensors = True,
)    

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=350,
  tokenizer=tokenizer,
  packing=True,
  formatting_func=createPrompt, # this will apply the generate_dataset_prompt to all training and test dataset mentioned above !!
  args=args,
  train_dataset=train_ds["train"],
  eval_dataset=test_ds["train"]
)   

model.config.use_cache = False
trainer.train()

Generating outputs from the fine-tuned model

#load the trained model and generate some outputs from it 

ft_model = PeftModel.from_pretrained(base_model , 'Checkpoint/base-checkpoint-10') #replace with the actual checkpoint name

eval_prompt = "<s>[INST] You are a coding model and your goal is to correctly tell the medical codes to the user based on the prompt they have entered and you get rewarded for correct output \n Tell me the medical code for cholera disease [/INST]"
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=150, repetition_penalty=1.15)[0], skip_special_tokens=True))

Wooh … that a lot to learn and understand

This is the template that can help you finetune any possible open source LLM available in the market !!

If you like this article do give it a clap and follow it !!

If you think there is any problem / the functions are not working just visit the github link there I have the updated code and resolves most of your doubts !!

Also connect with me on linkedin to get awesome content : https://in.linkedin.com/in/mohitdulani

Also if you want to add your feedback feel free to add a comment and if you want to help with the code raise an issue or PR in the below github link

Find the github link of the project here : https://github.com/complete-dope/Fine-tuning-LLMs/