Parameter-Efficient Fine-Tuning (PEFT): a novel approach for fine-tuning LLMs

4 min readJul 25, 2023

Parameter-Efficient Fine-Tuning (PEFT) is a novel approach for fine-tuning large language models (LLMs) that effectively reduces computational and memory requirements compared to traditional methods.

PEFT employs fine-tuning only on a small subset of the model’s parameters, while freezing most of the pre-trained network. This tactic mitigates catastrophic forgetting and significantly cuts computational and storage costs. I’ve written about traditional methods in this another article.

Various PEFT methods have been developed, such as:

Task-Guided Prompt Tuning: This technique utilizes task-specific prompts to guide the LLM’s output, obviating the need to retrain the entire model for a specific task.
Low-Rank Adaptation (LoRA): By approximating the LLM’s parameters with a low-rank matrix, LoRA decreases the number of fine-tuned parameters, enhancing LLM performance.
Adapters: These small, specialized layers can be added to the LLM for task adaptation, providing flexibility and performance improvement.
Task-Relevant Prefix Tuning: Fine-tuning the LLM on representative prefixes related to the task at hand enhances performance and task adaptability.

PEFT techniques have demonstrated efficacy in various tasks, including natural language inference, question answering, and text summarization. Researchers and practitioners can leverage PEFT to efficiently utilize LLMs across diverse tasks.

Here are more insights into the mentioned PEFT methods:

Task-Guided Prompt Tuning: By adding task-specific prompts to input data, the LLM performs the desired task. For instance, providing the prompt “What is the capital of Brazil?” guides the LLM to generate the response “Brasilia”.
Low-Rank Adaptation (LoRA): This advanced method approximates the LLM’s parameters using low-rank matrices, considerably enhancing its performance.
Adapters: Adapters are versatile small layers integrated into the LLM to enhance task-specific performance.
Task-Relevant Prefix Tuning: To fine-tune the LLM for question answering, a sequence of task-specific prefixes representing questions is used.

Below is a Python code example that demonstrates how to implement Parameter-Efficient Fine-Tuning (PEFT) for fine-tuning a pre-trained language model using the Hugging Face Transformers library. We will use the TextClassificationPipeline to fine-tune a pre-trained BERT model on the IMDb dataset for sentiment analysis.

import torch
from transformers import BertTokenizer, BertForSequenceClassification, AdamW, pipeline
from datasets import load_dataset

# Load the IMDb dataset
dataset = load_dataset("imdb")

# Load the pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)  # Binary sentiment analysis

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Split the dataset into training and validation sets
train_dataset, val_dataset = tokenized_dataset["train"], tokenized_dataset["test"].train_test_split(test_size=0.1)

# Hyperparameters and Training Configuration
learning_rate = 2e-5
batch_size = 16
num_epochs = 3

# Define the optimizer
optimizer = AdamW(model.parameters(), lr=learning_rate)

# Fine-tuning function using PEFT
def fine_tune(model, train_dataset, optimizer, num_epochs, batch_size):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    for epoch in range(num_epochs):
        total_loss = 0
        model.train()

        for i in range(0, len(train_dataset), batch_size):
            batch = train_dataset[i:i+batch_size]
            input_ids = torch.tensor(batch["input_ids"]).to(device)
            attention_mask = torch.tensor(batch["attention_mask"]).to(device)
            labels = torch.tensor(batch["label"]).to(device)

            optimizer.zero_grad()
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        print(f"Epoch: {epoch+1}/{num_epochs}, Average Loss: {total_loss / len(train_dataset)}")

# Fine-tune the model using PEFT
fine_tune(model, train_dataset, optimizer, num_epochs, batch_size)

# Save the fine-tuned model
output_dir = "fine_tuned_model/"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

# Load the fine-tuned model using the TextClassificationPipeline
classifier = pipeline("text-classification", model=output_dir, tokenizer=output_dir)

# Test the fine-tuned model on a sample text
sample_text = "This movie was fantastic! I loved every bit of it."
result = classifier(sample_text)
print(result)

This code demonstrates how to implement PEFT for fine-tuning a BERT model for sentiment analysis on the IMDb dataset. The fine-tuned model can be saved and used for inference using the TextClassificationPipeline.

PEFT (parameter-efficient transfer learning) is a novel approach to transfer learning that is designed to be more efficient than traditional fine-tuning. PEFT does this by using a two-step process:

A small number of adapter layers are added to the pre-trained model. These adapter layers are responsible for adapting the model to the new task.
The adapter layers are trained using a small amount of data from the new task.

This two-step process allows PEFT to achieve good performance on the new task while using significantly fewer parameters than traditional fine-tuning.

In contrast, transfer learning is a more general approach to using a pre-trained model on a new task. Transfer learning can be used in a variety of ways, including:

Feature-based transfer learning: In feature-based transfer learning, the features extracted from the pre-trained model are used as input to a new model that is trained on the new task.
Fine-tuning: In fine-tuning, the weights of the pre-trained model are updated using a small amount of data from the new task.
Pretrained adapters: Pretrained adapters are a type of adapter layer that are pre-trained on a large dataset of unlabeled data. Pretrained adapters can be used to improve the performance of PEFT on a new task.

So, is PEFT novel? Yes, PEFT is a novel approach to transfer learning that is designed to be more efficient than traditional fine-tuning. However, transfer learning is a more general approach to using a pre-trained model on a new task.

Parameter-Efficient Fine-Tuning (PEFT): a novel approach for fine-tuning LLMs

Written by Tales Matos