Fine tune any LLM using your custom dataset 🤯
In this article I will share with you a fine tuning template using which you can train any model of your choice !!
Get ready to learn something new and great and make sure to experiment this alongside as coding cant be learnt by reading !!
I have break down the template into 7 simple steps will list them here and then we will dive into them !!
- Setting up with the imports
- Getting the base model and tokenizer
- Quantizing the base model
- Creating a prompt template
- Using the PEFT technique for finetuning
- Creating a PEFT model and training it
- Generating outputs from the fine-tuned model
Damn Simple isn’t it ?? Fun part begins now , lets dive into the code part and understand what happens under the hood !!
Setting up with the imports
We will require : Transformers , bitsandbytes , peft , trl , dataset, torch
!pip install -q accelerate peft bitsandbytes transformers trl dataset torch
Then import these libraries to your use :
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
)
from peft import LoraConfig, PeftModel , prepare_model_for_kbit_training , get_peft_model
from trl import SFTTrainer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Getting the base model and tokenizer
Get your model id and import the model and build a tokenizer :
base_model_id = 'mistralai/Mixtral-8x7B-v0.1'
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16, #if your gpu supports it
bnb_4bit_quant_type = "nf4",
bnb_4bit_use_double_quant = False #this quantises the quantised weights
)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="cuda")
# Training_tokenizer (https://huggingface.co/docs/transformers/v4.37.2/en/model_doc/auto#transformers.AutoTokenizer.from_pretrained)
# https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer
tokenizer = AutoTokenizer.from_pretrained(
base_model_id,
truncation_side = "right",
padding_side="right",
add_eos_token=True,
add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token
Quantizing the base model
Your dataset should be formatted in the jsonl format :
train_ds = load_dataset("json" , data_files = 'codes.jsonl' , field = "train")
test_ds = load_dataset("json" , data_files = 'codes.jsonl' , field = "test")
base_model.gradient_checkpointing_enable() #this to checkpoint grads
model = prepare_model_for_kbit_training(base_model) #quantising the model (due to compute limits)
Creating a prompt template
The prompt template depends on model to model and usually in the readme of the project page you can find it … additionally you can search it on google as well !!
def createPrompt(example):
bos_token = '<s>'
system_prompt = '[INST] You are a medical coding model and your role is to give the medical codes \n'
input_prompt = f" {example['Input']} [/INST]"
output_prompt = f"{example['Output']} </s>"
return bos_token + system_prompt + input_prompt + output_prompt
Using the PEFT technique for finetuning
def printParameters(model):
trainable_param = 0
total_params = 0
for name , param in model.named_parameters():
total_params += param.numel()
if param.requires_grad:
trainable_param += param.numel()
print(f"Total params : {total_params} , trainable_params : {trainable_param} , trainable % : {100 * trainable_param / total_params} ")
peft_config = LoraConfig(
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
target_modules=[ #find the target modules that you want to
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
task_type="CAUSAL_LM"
)
Creating a PEFT model and training it
model = get_peft_model(model , peft_config)
printParameters(model)
if torch.cuda.device_count() > 1:
model.is_parallelizable = True
model.model_parallel = True
# https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/training_args.py#L161
# max_steps and num_train_epochs :
# 1 epoch = [ training_examples / (no_of_gpu * batch_size_per_device) ] steps
args = TrainingArguments(
output_dir = "LLama-2 7b",
# num_train_epochs=1000,
max_steps = 1000, # comment out this line if you want to train in epochs
per_device_train_batch_size = 4,
warmup_steps = 0.03,
gradient_accumulation_steps = 1,
logging_steps=10,
logging_strategy= "steps",
save_strategy="steps",
save_steps = 10,
evaluation_strategy="steps",
eval_steps=10, # comment out this line if you want to evaluate at the end of each epoch
learning_rate=2.5e-5,
bf16=True, #if your gpus supports this
logging_nan_inf_filter = False, #this helps to see if your loss values is coming out to be nan or inf and if that is the case then you may have ran into some problem
# lr_scheduler_type='constant',
save_safetensors = True,
)
trainer = SFTTrainer(
model=model,
peft_config=peft_config,
max_seq_length=350,
tokenizer=tokenizer,
packing=True,
formatting_func=createPrompt, # this will apply the generate_dataset_prompt to all training and test dataset mentioned above !!
args=args,
train_dataset=train_ds["train"],
eval_dataset=test_ds["train"]
)
model.config.use_cache = False
trainer.train()
Generating outputs from the fine-tuned model
#load the trained model and generate some outputs from it
ft_model = PeftModel.from_pretrained(base_model , 'Checkpoint/base-checkpoint-10') #replace with the actual checkpoint name
eval_prompt = "<s>[INST] You are a coding model and your goal is to correctly tell the medical codes to the user based on the prompt they have entered and you get rewarded for correct output \n Tell me the medical code for cholera disease [/INST]"
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
ft_model.eval()
with torch.no_grad():
print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=150, repetition_penalty=1.15)[0], skip_special_tokens=True))
Wooh … that a lot to learn and understand
This is the template that can help you finetune any possible open source LLM available in the market !!
If you like this article do give it a clap and follow it !!
If you think there is any problem / the functions are not working just visit the github link there I have the updated code and resolves most of your doubts !!
Also connect with me on linkedin to get awesome content : https://in.linkedin.com/in/mohitdulani
Also if you want to add your feedback feel free to add a comment and if you want to help with the code raise an issue or PR in the below github link
Find the github link of the project here : https://github.com/complete-dope/Fine-tuning-LLMs/