Real-World Example: Building a Fine-Tuned Model with OpenAI

Sriram Parthasarathy
GPTalk
Published in
9 min readAug 26, 2024

What is fine tuning?

Fine-tuning is the process of taking a pre-trained machine learning model and training it further on a smaller, specific dataset. This helps the model learn more about a particular topic or task, making it better at understanding and responding to related questions. Instead of starting from scratch, fine-tuning uses the model’s existing knowledge, saving time and resources while improving its performance for specific needs.

Key Benefits

Here are the key benefits of fine-tuning:

  • Saves Time and Money: Fine-tuning is cheaper and faster than training a model from scratch.
  • Improves Performance: It helps the model do better on specific tasks by using what it already knows.
  • Needs Less Data: You can get good results with less task-specific data.
  • Easier Customization: Fine-tuning allows you to adapt powerful models for your unique needs.

Real world fine tuning use case

Harvey is an AI company specializing in legal technology. They fine-tune large language models using legal data and law firm-specific information to create customized AI assistants for legal professionals. Harvey’s models are trained on complex legal tasks and can handle queries like:

“Summarize the key points of this contract and identify any potential risks.”
“What are the relevant precedents for this intellectual property dispute in California?”

This tailored approach allows Harvey to provide more accurate and relevant responses to legal inquiries.

High level steps to fine tuning using openAI

The most important part of the fine tuning is preparing the dataset. Once that is done the rest of the steps straight forward.

  • Prepare Training Data: Format your data as a JSONL file with prompt-completion pairs and ensure it’s high-quality and diverse.
  • Start Fine-tuning: Log in to OpenAI, go to the “Fine-tuning” section, click “Create New,” choose a model, upload your JSONL file, and monitor the process until it finishes.
  • Use the Fine-tuned Model: Select your model in the playground, test it with prompts, and use it in your applications.

Data preparation

When preparing data for fine-tuning an OpenAI model, the format of the training data is crucial. Here are the key points about the data format:

  • The training data must be in JSONL (JSON Lines) format.
  • Each line in the file represents a single training example as a JSON object.
  • Each line must be a valid JSON object.
  • No line breaks within individual JSON objects.
  • The file extension should be .jsonl.
  • Ensure high-quality, diverse examples that represent your desired task well.
  • Include at least 10 examples, but more is generally better for improved performance.

Data structure for Chat Models

For chat models like GPT-40, the structure should be:

{"messages": [{"role": "system", "content": "System message"}, 
{"role": "user", "content": "User message"},
{"role": "assistant", "content": "Assistant response"}]}
  • Each example is a complete conversation.
  • The “messages” array contains the conversation flow.
  • Roles can be “system”, “user”, or “assistant”.

Dataset used for Fine tuning

In this example, I’m using a cancer dataset to demonstrate the process. The goal is to classify patient documentation from Electronic Health Records (EHR) to determine if a patient currently has cancer, had cancer in the past, does not have cancer, or if a relative has cancer. I’ve provided a simplified dataset for illustration. You can follow these steps to fine-tune your own dataset.

The dataset includes text examples where each entry is classified based on the cancer status described. The categories are:

  1. Cancer_present: Texts describing ongoing cancer diagnoses or treatments. Example: “She is currently undergoing chemotherapy for breast cancer.”
  2. Cancer_past: Texts referring to past cancer diagnoses or remissions. Example: “He received a positive test result confirming prostate cancer.”
  3. Cancer_relative: Texts related to someone else’s cancer experience. Example: “My mom battled breast cancer for two years but is now in remission.
  4. No_Cancer: Texts related to patient not having cancer. Example: “She tested negative for colon cancer.

Here is an one row of data

{"messages": [{"role": "system", "content": "Your task is to classify a piece 
of text into the following category labels: [\"Cancer_present\",
\"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."},
{"role": "user", "content": "She was diagnosed with stage 3 colon cancer
last month."},
{"role": "assistant", "content": "Cancer_present"}]}

Its important to get good quality data examples. For fine tuning you need atleast 10 rows of data. Here is the 10 rows I have labelled that I am using for the training for the model to learn from to make classifications.

Here is that data in the JSONL format

{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "She was diagnosed with stage 3 colon cancer last month."}, {"role": "assistant", "content": "Cancer_present"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "The oncologist confirmed that his symptoms were related to cancer."}, {"role": "assistant", "content": "Cancer_present"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "Her recent scans revealed malignant tumors in her lungs."}, {"role": "assistant", "content": "Cancer_present"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "She is currently undergoing chemotherapy for breast cancer."}, {"role": "assistant", "content": "Cancer_present"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "He received a positive test result confirming prostate cancer."}, {"role": "assistant", "content": "Cancer_present"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "She had cancer as a teenager, but she’s been in remission ever since."}, {"role": "assistant", "content": "Cancer_past"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "After five years of being cancer-free, he still goes for regular checkups."}, {"role": "assistant", "content": "Cancer_past"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "She survived cervical cancer and has been in good health since."}, {"role": "assistant", "content": "Cancer_past"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "His test results confirmed that he’s fully recovered from skin cancer."}, {"role": "assistant", "content": "Cancer_past"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "My mom battled breast cancer for two years but is now in remission."}, {"role": "assistant", "content": "Cancer_relative"}]}

A smaller set of examples (validation dataset) to test the model’s performance during training. It helps tune hyperparameters and select the best model by providing feedback on its ability to generalize to unseen data.

Here is my validation data set I used..

{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "His father passed away from lung cancer, which makes him worried about his own risk."}, {"role": "assistant", "content": "Cancer_relative"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "Her cousin was recently diagnosed with early-stage cervical cancer."}, {"role": "assistant", "content": "Cancer_relative"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "She’s worried because her sister was diagnosed with breast cancer last year."}, {"role": "assistant", "content": "Cancer_relative"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "His mother is currently fighting colon cancer, so he decided to get checked."}, {"role": "assistant", "content": "Cancer_relative"}]}
{"messages": [{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."}, {"role": "user", "content": "Her father’s history of prostate cancer has made her more cautious about her health."}, {"role": "assistant", "content": "Cancer_relative"}]}

Now we are ready to fine tune. It can be done via code as well as via UI.

You can also trigger the fine tuning via code.

from openai import OpenAI
client = OpenAI()

# start a fine-tuning job using the OpenAI SDK:
client.fine_tuning.jobs.create(
training_file="file-abc123",
model="gpt-4o-mini"
)

# Retrieve the state of a fine-tune
client.fine_tuning.jobs.retrieve("ftjob-abc123")

# Cancel a job
client.fine_tuning.jobs.cancel("ftjob-abc123")

# List up to 10 events from a fine-tuning job
client.fine_tuning.jobs.list_events(fine_tuning_job_id="ftjob-abc123", limit=10)

In the next section I walk you through the process of fine tuning by using the dashboard which maybe the best way to do it for beginners / first time.

Fine tuning using openAI interface

1. Access Fine-tuning Interface:

  • Log in to OpenAI and go to the “Fine-tuning” section.
Go to the fine tune section

2. Create Fine-tuning Job:

Choose a base model, upload your JSONL file, and give your job a name.

Select the model and Upload the data files
Pick the the model you would like to fine tune. I used the 4o-mini
Upload your training and validation dataset
Set hyper parameters. When fine-tuning, adjust learning rate, batch size, and epochs carefully. Start with recommended defaults and experiment to find the best balance for your task.

3. Set Hyper parameters

Here are the three main hyperparameters for training in OpenAI

  1. Learning Rate:
    This controls how quickly the model learns. A higher rate means faster learning, but it might miss important details. A lower rate learns more carefully but takes longer.
  2. Batch Size:
    This is how many examples the model looks at before updating what it knows. A bigger batch can help it learn better patterns, but it needs more computer memory.
  3. Number of Epochs:
    This is how many times the model goes through all the training data. More epochs mean more practice, but too many can make the model memorize instead of learn. Range: Usually 1 to 10. I used 10.

4. Monitor Progress

Watch the fine-tuning process on the dashboard until it’s done.

My 1st run failed because I had a problem with my training data. I fixed that

My JSONL file was incorrect as the json was not in one line. I had to fix that

I fixed that and reuploaded the data and everything worked after that.

Loss metrics

Here are the hyper parameters values I had choosen

Hyper parameters used

Here are the key loss metrics to track as the training progresses

Monitor training and validation loss during fine-tuning. Decreasing losses indicate good learning. Watch for overfitting if validation loss increases while training loss decreases.

It took 15 mins to finish the fine tuning job

5. Test the model

Now lets use the fine tuned model in playground

You can also query via code

from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
model="Enter_the_fine_tuned_model_IDhere",
messages=[
{"role": "system", "content": "Your task is to classify a piece of text into the following category labels: [\"Cancer_present\", \"Cancer_past\", \"Cancer_relative\", \"No_Cancer\"]."},
{"role": "user", "content": "She tested negative for colon cancer!"}
]
)
print(completion.choices[0].message)

Now this fine tuned model is ready.

The Economics of Fine-Tuning

While the initial training tokens are free, it’s important to consider the full cost structure:

GPT-4O

- Training: $25 per million tokens
- Inference: $3.75 per million tokens
- Output: $15 per million tokens

GPT-4O-min

- Training: $3 per million tokens
- Inference: $0.3 per million tokens
- Output: $1.25 per million tokens

Conclusion

Fine-tuning OpenAI models is a great way to make AI work better for specific tasks. By preparing good training data, choosing the right model, and training it well, businesses can create custom AI solutions that perform better than general models. While fine-tuning takes time and resources, it can lead to better results and save money.

However, fine-tuning isn’t always needed. Sometimes, adjusting prompts can give good results without extra training. It’s important to think about your specific needs and data before deciding to fine-tune. When done right, fine-tuning can open up new opportunities for using AI effectively.

--

--