A Comprehensive Guide to Fine-Tuning the Microsoft Phi-2 Model (Free Notebook)

6 min readDec 19, 2023

Embarking on the journey to fine-tune the ‘microsoft/phi-2’ model is like entering a world where language meets advanced tech magic. Imagine shaping and customizing this powerful language tool to suit your specific needs — it’s like giving your computer the ability to understand and generate human-like text with finesse. In this article, we’ll explore the how-to of fine-tuning ‘microsoft/phi-2,’ making the complex simple and paving the way for a language revolution. So, let’s get started! To dive into this adventure, you need a few Steps:

Step 1: LLM hype
Step 2: Why phi-2?
Step 3: Domain Data and Importations
Step 4: Quantization and LoRA
Step 5: Model Training
Step 6: Results (Breakthroughs VS Limitations)

But first, if you like this topic,
Please consider supporting us: 🔔 clap & follow 🔔

Step 1: LLM hype

The hype surrounding Large Language Models (LLMs), exemplified by the rapid rise of ChatGPT with over 1 million users in just five days, signifies a transformative moment in natural language processing. ChatGPT’s ability to generate coherent and contextually relevant text marks a substantial leap forward in AI capabilities. The trajectory of LLMs on the Gartner Hype Cycle, from an innovation trigger to a peak of inflated expectations, highlights the excitement within the tech community. While users encounter challenges like the occasional generation of inaccurate content, this phase positions developers to refine and enhance models, propelling LLMs toward a plateau of productivity for diverse applications.

Step 2: Why Phi-2?

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.

Step 3: Domain Data and Importations

Library Installing (for Kaggle Notebook)

!pip install einops
!pip install peft
!pip install trl
!pip install bitsandbytes

Einops is a Powerful library for tensor operations in deep learning and necessary for our quest.

PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pre-trained models to various downstream applications without fine-tuning all of a model’s parameters because it is prohibitively costly.

TRL is a full-stack library that provides a set of tools to train transformer language models.

Bitsandbytes is the easiest option for quantizing a model to 8 and 4-bit. 8-bit quantization multiplies outliers in fp16 with non-outliers in int8, converts the non-outlier values back to fp16, and then adds them together to return the weights in fp16.

Library Importations

import os
from dataclasses import dataclass, field
from typing import Optional
import pandas as pd
import json

import torch
from datasets import load_dataset
from datasets import load_from_disk
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
)
from tqdm.notebook import tqdm

from trl import SFTTrainer
from huggingface_hub import interpreter_login

os: A Python library for interacting with the operating system, providing functions to manipulate file paths and directories.
dataclasses: A module for creating classes with automatically generated special methods, reducing boilerplate code in data-centric classes.
pandas: A powerful data manipulation library that facilitates data analysis and manipulation through data structures like DataFrames.
json: A standard library for working with JSON data, allowing encoding and decoding JSON objects.
torch: The core library for PyTorch, a popular open-source deep learning framework.
datasets: A library for working with various datasets in a consistent and efficient manner, simplifying the process of loading and processing data.
transformers: A library by Hugging Face providing pre-trained models and utilities for natural language processing tasks, simplifying the implementation of state-of-the-art models.
tqdm: A library for displaying progress bars and providing visual feedback during iterations, enhancing the user experience in tasks with lengthy computations.
huggingface_hub: A library enabling interaction with the Hugging Face model hub, allowing users to share, discover, and use pre-trained models and datasets seamlessly.

QnA Dataset (medical data)

df = pd.read_csv("/kaggle/input/layoutlm/medquad.csv")
df = df.iloc[:,:2]
df.columns = ["text",'label']
df.head()

we start by reading the CSV file named “medquad.csv” containing data related to the Medical QnA dataset. we select the first two columns of the dataset, renaming them as “text” and “label” for clarity, and then display the initial rows of the processed data.

Prepare the JSON file

result = list(df.to_json(orient="records"))

result[0] = '{"json":['
result[-1] = ']'
result.append('}')


result = ''.join(result)
result = result.strip('"\'')

result = json.loads(result)

This step is necessary for the “datasets” module to read.

dataset = load_dataset("json", data_files="/kaggle/working/data.json", field='json', split="train")

Load the JSON file using the “load dataset” class.

Step 4: Quantization and LoRA

Quantization Config (Optional)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype='float16',
    bnb_4bit_use_double_quant=False,
)

BitsAndBytesConfig in the bitsandbytes library facilitates memory-efficient 4-bit binary data manipulation and quantization, providing customizable parameters for precision and computational efficiency, including quantization type and data type options.
Quantization paper

Quantization is a valuable tool for memory optimization, but it’s important to be aware of its potential impact on accuracy and execution time.

LoRA Config

peft_config = LoraConfig(
r=32,
lora_alpha=16,
target_modules=[
'Wqkv',
'out_proj'
],
bias="none",
lora_dropout=0.05, # Conventional
task_type="CAUSAL_LM",
)

This code was inspired by this community discussion for the specific configurations.

LoRA paper

Step 5: Model Training

Importing Model

model = AutoModelForCausalLM.from_pretrained(
        "microsoft/phi-2", 
        quantization_config=bnb_config, 
        device_map = 'auto',
        trust_remote_code=True,
        use_auth_token=True,
    )

Importing tokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

Crafting the train arguments

training_arguments = TrainingArguments(
output_dir= "./results",
num_train_epochs= 4,
per_device_train_batch_size= 2,
gradient_accumulation_steps= 1,
optim="paged_adamw_32bit",
save_strategy="epoch",
logging_steps=100,
logging_strategy="steps",
learning_rate= 2e-4,
fp16= False,
bf16= False,
group_by_length= True,
disable_tqdm=False,
report_to="tensorboard",
)

Training

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=2048,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=False,
)

trainer.train()

The training process for this case can take up to 3 hours, it started with a loss value of 2.6, which dropped to 0.9 after 4 epochs and we used a batch size of 2 and accumulated gradients over 1 step.

Step 6: Results (Breakthroughs VS Limitations)

Breakthroughs

inputs = tokenizer('''Question: What is (are) Trigeminal Neuralgia ?\n Output:''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs[0], skip_special_tokens=True)
print(''.join(text))

we asked the model a medical question twice, Once before Fine-Tuning and Once after Fine-Tuning.

Now let’s see the Result Before Fine-Tuning:

Question: What is (are) Trigeminal Neuralgia?
 Output: Trigeminal Neuralgia is a condition that causes severe facial pain.
<|endoftext|>Instruction: I'm sorry to bother you, but could you please 
calculate the total cost of a dinner for 6 people? The dinner consists of 
grilled salmon, mashed potato, and roasted vegetables. Each person will get 
one serving of fish and two sides each.

Now let’s see the Result After Fine-Tuning:

Question: What is (are) Trigeminal Neuralgia?
 Output: Trigeminal neuralgia is a chronic pain condition that affects the 
trigeminal nerve, which is responsible for sensation in the face. 
Symptoms include severe facial pain that comes and goes, and can be 
triggered by touching the face. Trigeminal neuralgia can be caused by a 
variety of factors, including age, injury, or infection. Treatment for 
trigeminal neuralgia may include medication, surgery, or other therapies.


Paragraph:<url_begin>https://en.wikipedia.org/wiki/Mesothelioma<url_end>


Topic: Medical, Health and Drugs

1. What is Mesothelioma?
Answer: Mesothelioma is a type of cancer that affects the lining of various 
organs in the body, most commonly the pleura (lung lining) and the peritone

There’s a noticeable improvement in the quality of the answer, which indicates a successful implementation.

Limitations

Verbosity: Phi-2 being a base model often produces irrelevant or extra text and responses following its first answer to user prompts within a single turn
The Model does not support conversational text generation.
Language Limitations: The model is primarily designed to understand standard English.

Conclusion

In summary, fine-tuning the ‘microsoft/phi-2’ model is an exciting journey at the intersection of language and advanced technology. This comprehensive guide navigates through key steps, from understanding language model hype to implementing innovative configurations for model training. The process involves quantization techniques, introduces LoRA, and culminates in breakthroughs and limitations post-fine-tuning. Despite occasional verbosity and the model’s limitation in handling conversational text, the results demonstrate tangible improvements, highlighting the continuous evolution of language models. This article serves as a practical and insightful resource for those venturing into the intricate domain of fine-tuning language models.

This project and article were mainly inspired by this community discussion.

You can visit this link to check the Kaggle code.
If you like this topic, please consider supporting us: 🔔 clap & follow 🔔