Things need to know before fine-tuning LLM models

5 min readJan 28, 2024

Although I’ve used the OpenAI API for over a year, fine-tuning with open-source models like Llama-2 posed challenges. Dealing with unfamiliar libraries and concepts can be tricky. After some research, here’s what I think newbies like me should know before attempting fine-tuning.

1. Basic Terminology

Understanding basic terminology can save you a lot of time when searching for the right documents to solve your problem. Here are some key terms that I believe are important to know.

1.1. Prompt and Prompt engineering

Prompt
In simple terms, a prompt is what you give to a language model as input. The output generated by the model is called a completion.

Prompt Engineering
To improve the model’s response, the simplest method is to adjust the prompt. There are various technical approaches for this task, such as Zero-shot, Few-shot, Chain of Thought (COT), etc. All of these techniques fall under the umbrella of prompt engineering.

Context Windows
The amount of text available for the prompt or the memory is called the context window.

1.2. In-context Learning

Traditional models like sentiment models are trained to predict specific tasks, such as classifying user comments and the model will respond based on the data it’s trained on.

When using LLM models for specific tasks like sentiment analysis, we adjust the prompt, which is called In-context learning (ICL).

2. Overall process of building LLMs

Understanding the overall process of building LLM models is crucial. Each step addresses different issues with specific solutions. Having a clear understanding makes it easier to focus on the concepts and method need to solve your particular problem.

Building LLMs can separate into three step :

Pretraining
Supervised Fine-Tuning
Reinforcement Learning from Human Feedback (RLHF)

2.1 Pretraining

In the pre-training step, the model is trained with text data from books, the internet, etc. It learns to predict the next word, helping the model understand the context of sentences and the meaning of each word.

However, pre-training is just the initial step, as it solely teaches the model how to predict the next word in a sentence. This proficiency falls short when addressing specific requirements such as responding politely, avoiding toxicity, or performing specialized tasks like sentiment analysis.

To harness the full potential of the model, two methods are employed supervised fine-tuning and Reinforcement Learning from Human Feedback (RLHF). Let’s explore how supervised fine-tuning and RLHF are utilized.

2.2 Supervised Fine-tuning

While in-context learning is one of the simplest methods to instruct the model to perform specific tasks, it still encounters certain problems:

The limitation of context windows prevents the addition of sufficient information to precisely guide the model in performing the desired task.
Language model models don’t always respond in the exact format we desire etc.

To address these challenges, supervised fine-tuning emerges as one of the most promising solutions.

For example, to instruct the model to generate SQL code that operates effectively on a Teradata DB, we need to prepare the following data:

Instruction: A specific task or instruction you want the LLM model to perform
Prompt : User question
Completion: Awaiting output from the LLM model.

2.3 Reinforcement Learning from Human Feedback

RLHF is applied in tasks where defining a clear, algorithmic solution is challenging, but humans can readily assess the quality of the model’s output. For instance, when requiring a polite response from the model while avoiding biases, yet lacking clear rules in such instances, RLHF emerges as the optimal method to address this issue.

Various reward models are built based on human feedback to guide the model’s behavior on aspects like safety, helpfulness, avoidance of toxic answer, and reduction of bias. These models serve as a form of reinforcement to help the model improve over time. For more detail, please refer to this link.

3. Delving into Supervised Fine-Tuning

After grasping fundamental terminology and the overarching process of constructing LLM models, it’s time to delve into supervised fine-tuning — a pivotal step in leveraging your LLM models to perform specific tasks.

3.1 Fine-Tuning Method

There are numerous methods available for fine-tuning models, which can be broadly categorized into two groups:

Full Fine-Tuning
This approach involves updating all the weights of LLM models during the fine-tuning process.
Parameter Efficient Fine-Tuning (PEFT)
Instead of updating all the weights of LLM models, this method focuses on updating only a subset of weights or adding a new layer and updating the weights of that specific layer. Such an approach is beneficial for running the fine-tuning process on hardware with limited capabilities.

3.2 Parameter Efficient Fine Tuning (PEFT)

Parameter-Efficient Fine-Tuning (PEFT) methods adapt pre-trained language models (PLMs) to new tasks without tweaking all parameters, saving on computational costs. Recent PEFT techniques match the performance of full fine-tuning while updating fewer model parts.

Within the PEFT group, various methods exist, including LoRa, QLoRA, Adapter, etc. Each method has its own set of pros and cons.

Huggingface offers a library for Parameter-Efficient Fine-Tuning (PEFT) at this link. In the next article, I’ll provide a straightforward demo on how to use PEFT for fine-tuning models.

Conclusion

This blog content serves as notes from my study, highlighting essential knowledge before delving into fine-tuning models for specific tasks. However, if there are any concepts I’ve overlooked or important content you believe should be emphasized, please feel free to correct or add them in the comments section