[Machine Learning] Fine Tuning OpenAI GPT-3.5-Turbo: Crafting an “Olympic-Ready” Knowledge Bot

Keshav Singh
18 min readSep 18, 2023

--

Fine Tuning OpenAI GPT-3.5-Turbo

Continuing from my last blog “Unleashing the Power of RAG: Crafting an “Olympic-Ready” Knowledge Bot with OpenAI’s LLMs”, we will look to explore fine-tuning OpenAI’s LLM.

In case you are interested in the tradeoffs and intricacies of LLM finetuning vs. RAGs, I would highly recommend reading my detailed blog explaining it “Fine Tuning Large Language Models on Azure Machine Learning”. It will expose & enrich you on the topic with a hands-on demonstration of open-source LLMs from HuggingFace.

Fine-tuning LLM is the process of further training additional layers on top of the existing LLM (in our case gpt-3.5-turbo) with specific datasets and behavior. This dataset is static, meaning is “point in time” and if the dataset evolves in the future additional re-training would be required to create a new version adapting data. It is tailored to the application’s specific needs and requires extensive labeled data for robust training and effectiveness. As depicted in the design above it illustrates LLM fine-tuning which once finetuned helps seamless inferencing for the use case.

RAG / Fine Tuning oftentimes appears to be a linear progression but is not so. Each of them has a very specific use case and needs fully evaluated.

Considerations

  • Does your application warrant external data probing for each knowledge query? Certainly, RAG meets this.
  • Is the data of a dynamic and evolving nature? Retraining/Updating LLMs is an expensive and time-consuming process. RAGs are suited for such.
  • Do you have an exhaustive labeled dataset, that is long-lived static, and nuanced to a tone and nature of response? Fine Tuning would be ideal.
  • Would you intend to mitigate and minimize Hallucination? When opting for FineTuning for this situation, make sure to understand you have factually labeled large-scale and precise data and are open to retraining as new facets of data surface. It would be ideal to help the cause. Subsequently, RAGs are well suited yet costly since they would derive the context and make large token calls but offer full knowledge to the LLMs to derive and frame responses. Make sure you take 360 consideration. In my experience RAGs are more purpose-built to minimize hallucination.
  • Do you need an explainable LLM solution? RAGs can be fairly well explained over fine-tuned LLMs. Fine-tuned LLMs are quite a black box without much control over hallucinations and/or explainability. Effective prompt engineering is absolutely necessary to an extent to ground fine-tuned LLMs and limit their run wild.
  • Do you need a high degree of toning, branding, and linguistic specificity? Fine Tuning helps train specialized tailored solutions and can provide your solutions feel a part of a larger organizational umbrella fully attuned.

Design

A word on design. When designing an LLM solution scale, complexity, latency, robustness, reliability, UX, frugality aspects, and data privacy concerns are some paramount areas of evaluating the operationalization, maintenance, and change management of a solution. The whole life cycle must be thoroughly vetted before crafting a futuristic solution.

With the awareness of the tradeoffs between RAGs and finetuning, I would assert — Fine Tuned LLMs simplify the effectiveness of the business purpose, once finetuned the LLM needs limited contextual knowledge for querying. It helps save cost and latency. There are much fewer tokens required for querying. I highly recommend starting with few-shot, prompt engineering RAG-based approaches before considering finetuning. Techniques such as prompt chaining help much in building a focused solution aimed to accelerate faster iterations.

Without reinventing the wheel, I would love to borrow and endorse an excellent writeup by Raphael MansuyAI: RAG vs Fine-tuning — Which Is the Best Tool to Boost Your LLM Application?” I highly recommend learning more.

Flowchart depicting key factors in choosing between Retrieval-Augmented Generation (RAG) and fine-tuning for optimizing large language models. Decision nodes assess needs for external knowledge, model adaptation, reducing hallucinations, training data availability, data dynamics, and interpretability. Paths lead to recommending either RAG or fine-tuning based on specific use case requirements.

In this blog, we will continue to build our knowledge on the series and illustrate how OpenAI empowers finetuning its LLM models with specific datasets and needed grounding. We will continue to leverage the Olympic dataset (used on our OAI RAG solution). The code below demonstrates steps to -

  • prepareContextData class — Prepare Dataset, Transform, prepare training and validations
  • oaifinetuning class — uploads the jsonl files, creates and tracks the OAI finetuning job
  • oaiChatCompletion class — leverages the finetuned model to inference based on the query prompts.

The Data preparation has 3 strategies in this example, first we provide several examples of the pattern of prompt querying for a positive situation, we then also provide some non-contextual examples to illustrate best handling it and lastly we provide some grounding rules for the knowledge bot. In summary this helps us build a effective and precise bot, for a real scenario adding on more training and validation data over several new facets helps in eliminating biases, adding required additional groundings.

Our aim on this blog is to build a bot exclusively for the Olympics.

#https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
import pandas as pd
import openai
import os, json
from dotenv import load_dotenv
load_dotenv()
import time

class prepareContextData:
def __init__(self):
self.training_file_name = "olympics_finetune_training.jsonl"
self.validation_file_name = "olympics_finetune_validation.jsonl"
self.olympics_data=[
{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What was the age of Atakan Alaftargil who participated in olympic games held in the year 2002?"}, {"role": "assistant", "content": "Atakan Alaftargil a Male, was aged 25.0, height Unknown centimeters, Weight Unknown kilograms, from team Turkey , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Alpine Skiing sporting category for the event Alpine Skiing Mens Slalom but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won Gold Medal in olympic games held in the year 2002 in the Biathlon sporting category for the event Biathlon Mens 4 x 7.5 kilometres Relay?"}, {"role": "assistant", "content": "Frode Andresen a Male, aged 28.0, height 191.0 centimeters, Weight 76.0 kilograms, from team Norway , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 4 x 7.5 kilometres Relay won a Gold medal"}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Frode Andresen win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 12.5 kilometres Pursuit?"}, {"role": "assistant", "content": "Frode Andresen a Male, aged 28.0, height 191.0 centimeters, Weight 76.0 kilograms, from team Norway , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 12.5 kilometres Pursuit but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won the Bronze medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Womens 4 x 5 kilometres Relay?"}, {"role": "assistant", "content": "Brigitte Albrecht-Loretan a Female, aged 31.0, height 175.0 centimeters, Weight 64.0 kilograms, from team Switzerland , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Womens 4 x 5 kilometres Relay won a Bronze medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Frode Andresen win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 10 kilometres Sprint?"}, {"role": "assistant", "content": "Frode Andresen a Male, aged 28.0, height 191.0 centimeters, Weight 76.0 kilograms, from team Norway , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 10 kilometres Sprint but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Which events did Meelis Aasme participate in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City?"}, {"role": "assistant", "content": "Meelis Aasme a Male, aged 29.0, height 182.0 centimeters, Weight 73.0 kilograms, from team Estonia , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Mens 15 kilometres but did not win any medal.Meelis Aasme a Male, aged 29.0, height 182.0 centimeters, Weight 73.0 kilograms, from team Estonia , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Men's 50 kilometres but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Which events did Julianne Anne Abernathy participate in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City?"}, {"role": "assistant", "content": "Julianne Anne Abernathy a Female, aged 48.0, height 165.0 centimeters, Weight 75.0 kilograms, from team United States Virgin Islands , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Luge sporting category for the event Luge Womens Singles but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Ronny Ackermann win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Nordic Combined sporting category for the event Nordic Combined Mens Team?"}, {"role": "assistant", "content": "Ronny Ackermann a Male, aged 24.0, height 184.0 centimeters, Weight 69.0 kilograms, from team Germany , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Nordic Combined sporting category for the event Nordic Combined Mens Team won a Silver medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Maksim Sergeyevich Afinogenov win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ice Hockey Mens?"}, {"role": "assistant", "content": "Maksim Sergeyevich Afinogenov a Male, aged 22.0, height 183.0 centimeters, Weight 88.0 kilograms, from team Russia , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ice Hockey sporting category for the event Ice Hockey Mens Ice Hockey won a Bronze medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Janne Petteri Ahonen win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ski Jumping Mens Large Hill?"}, {"role": "assistant", "content": "Janne Petteri Ahonen a Male, aged 24.0, height 184.0 centimeters, Weight 66.0 kilograms, from team Finland , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ski Jumping sporting category for the event Ski Jumping Mens Large Hill, Team won a Silver medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won Bronze Medal in olympic games held in the year 2002 in the Biathlon Women's 4 x 7.5 kilometres Relay?"}, {"role": "assistant", "content": "Albina Khamitovna Akhatova a Female, aged 25.0, height 160.0 centimeters, Weight 55.0 kilograms, from team Russia , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Women's 4 x 7.5 kilometres Relay won a Bronze medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Karsten Klaus Albert win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Luge Mens Singles?"}, {"role": "assistant", "content": "Karsten Klaus Albert a Male, aged 33.0, height 186.0 centimeters, Weight 90.0 kilograms, from team Germany , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Luge sporting category for the event Luge Mens Singles but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What was the age of Kilian Albrecht who participated in olympic games held in the year 2002?"}, {"role": "assistant", "content": "Kilian Albrecht a Male, aged 28.0, height 181.0 centimeters, Weight 72.0 kilograms, from team Austria , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Alpine Skiing sporting category for the event Alpine Skiing Mens Slalom but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What was the age of Brigitte Albrecht-Loretan who participated in olympic games held in the year 2002?"}, {"role": "assistant", "content": "Brigitte Albrecht-Loretan a Female, aged 31.0, height 175.0 centimeters, Weight 64.0 kilograms, from team Switzerland , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Womens 15 kilometres but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What was the age of Chimene Mary Chemmy Alcott (Crawford) who participated in olympic games held in the year 2002?"}, {"role": "assistant", "content": "Chimene Mary Chemmy Alcott (Crawford) a Female, aged 19.0, height 169.0 centimeters, Weight 69.0 kilograms, from team Great Britain , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Alpine Skiing sporting category for the event Alpine Skiing Womens Downhill but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who was the actor of the movie Top Gun Maverick?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won the last cricket world cup?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What is the GDP of United States of America?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What is the distance of the Earth from the Moon?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What was the age of Lyubov Alekseyeva who participated in olympic games held in the year 2002, did she win any medal in Ice Hockey Womens Ice Hockey?"}, {"role": "assistant", "content": "Lyubov Alekseyeva a Female, aged 17.0, height 154.0 centimeters, Weight 47.0 kilograms, from team Kazakhstan , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ice Hockey sporting category for the event Ice Hockey Womens Ice Hockey but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won Gold Medal in olympic games held in the year 2002 in the Cross Country Skiing sporting category for the event Cross Country Skiing Mens 10/10 kilometres Pursuit?"}, {"role": "assistant", "content": "Thomas Alsgaard (Alsgrd-) a Male, aged 30.0, height 190.0 centimeters, Weight 82.0 kilograms, from team Norway , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Mens 10/10 kilometres Pursuit won a Gold medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who was the actor of the movie Interstellar?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won the last football world cup?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What is the GDP of India?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "What is the distance of the Earth from the Sun?"}, {"role": "assistant", "content": "I am sorry, I can only answer questions on Olympics."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Simon Ammann win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ski Jumping sporting category for the event Ski Jumping Mens Normal Hill, Individual?"}, {"role": "assistant", "content": "Simon Ammann a Male, aged 20.0, height 171.0 centimeters, Weight 55.0 kilograms, from team Switzerland , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ski Jumping sporting category for the event Ski Jumping Mens Normal Hill, Individual won a Gold medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Who won the Silver medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ice Hockey sporting category for the event Ice Hockey Mens Ice Hockey?"}, {"role": "assistant", "content": "Anthony Lewis Tony Amonte a Male, aged 31.0, height 183.0 centimeters, Weight 91.0 kilograms, from team United States , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Ice Hockey sporting category for the event Ice Hockey Mens Ice Hockey won a Silver medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Which events did Frode Andresen participate and win a Gold in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City?"}, {"role": "assistant", "content": "Frode Andresen a Male, aged 28.0, height 191.0 centimeters, Weight 76.0 kilograms, from team Norway , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Biathlon sporting category for the event Biathlon Mens 4 x 7.5 kilometres Relay won a Gold medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Aleksey Nikolayevich Andryunin win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the event Bobsleigh Mens Four?"}, {"role": "assistant", "content": "Aleksey Nikolayevich Andryunin a Male, aged 25.0, height 189.0 centimeters, Weight 100.0 kilograms, from team Russia-2 , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Bobsleigh sporting category for the event Bobsleigh Mens Four but did not win any medal."}]}
,{"messages": [{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."}, {"role": "user", "content": "Did Tobias Angerer win any medal in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the event Cross Country Skiing Mens Sprint?"}, {"role": "assistant", "content": "Tobias Angerer a Male, aged 24.0, height 179.0 centimeters, Weight 72.0 kilograms, from team Germany , participated in olympic games held in the year 2002 , for the Winter season, hosted in the city Salt Lake City in the Cross Country Skiing sporting category for the event Cross Country Skiing Mens Sprint but did not win any medal."}]}
]
def write_jsonl(self,data_list: list, filename: str) -> None:
with open(filename, "w") as out:
for ddict in data_list:
jout = json.dumps(ddict) + "\n"
out.write(jout)
return

def transform(self):
training_data = self.olympics_data[:20]
validation_data = self.olympics_data[21:]
self.write_jsonl(training_data, self.training_file_name)
self.write_jsonl(validation_data, self.validation_file_name)
return

class oaifinetuning:
def __init__(self):
self.training_file_name = "olympics_finetune_training.jsonl"
self.validation_file_name = "olympics_finetune_validation.jsonl"
self.openai_api_key = os.getenv("OPENAI_API_KEY")

def uploadTuningFiles(self):
training_response = openai.File.create(file=open(self.training_file_name, "rb"), purpose="fine-tune", api_key=self.openai_api_key)
training_file_id = training_response["id"]
validation_response = openai.File.create(file=open(self.validation_file_name, "rb"), purpose="fine-tune", api_key=self.openai_api_key)
validation_file_id = validation_response["id"]
print(validation_response)
print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)
while openai.File.retrieve(id=training_file_id,api_key=self.openai_api_key).status!="processed" and openai.File.retrieve(id=validation_file_id,api_key=self.openai_api_key).status!="processed":
time.sleep(60)
return training_file_id,validation_file_id

def createFineTuningJob(self,training_file_id,validation_file_id):
response = openai.FineTuningJob.create(
training_file=training_file_id,
validation_file=validation_file_id,
model="gpt-3.5-turbo",suffix="olympics",
api_key=self.openai_api_key)
job_id = response["id"]
print("FineTuning Job ID:", response["id"])
print("Fine Tuning Job Status:", response["status"])
return job_id

def getFineTuningJobEvents(self,jobId):
while openai.FineTuningJob.retrieve(jobId, api_key=self.openai_api_key).status != "succeeded":
response = openai.FineTuningJob.retrieve(jobId, api_key=self.openai_api_key)
print("Job ID:", response["id"])
print("Status:", response["status"])
print("Trained Tokens:", response["trained_tokens"])
fine_tuned_model_id = response["fine_tuned_model"]
print("Fine-tuned model ID:", fine_tuned_model_id)

event_response = openai.FineTuningJob.list_events(id=jobId, limit=50, api_key=self.openai_api_key)
events = event_response["data"]
events.reverse()
for event in events:
print(event["message"])
time.sleep(180)
response = openai.FineTuningJob.retrieve(jobId, api_key=self.openai_api_key)
print(response)
return response["id"],response["status"],response['fine_tuned_model']

class oaiChatCompletion:

def __init__(self,fine_tuned_model_id):
self.fine_tuned_model_id=fine_tuned_model_id
self.openai_api_key = os.getenv("OPENAI_API_KEY")

def fineTunedChatCompletion(self,userQuery):
completion = openai.ChatCompletion.create(
api_key=self.openai_api_key,
model=self.fine_tuned_model_id,
messages=[
{"role": "system", "content": "Maverick is a factual chatbot. I will answer your queries only on Olympics and no other topic."},
{"role": "user", "content": userQuery}
]
)
print(completion.choices[0].message)
chatResponse = completion.choices[0].message
return chatResponse

if __name__=="__main__":
prepftContext = prepareContextData().transform()
promptQuery = "Which Actory played Maverick in the movie Top Gun?"
#"What was the age of Atakan Alaftargil who participated in olympic games held in the year 2002?"
oaift = oaifinetuning()
training_file_id,validation_file_id = oaift.uploadTuningFiles()
jobid = oaift.createFineTuningJob(training_file_id,validation_file_id)
jobId,jobStatus,fine_tuned_model = oaift.getFineTuningJobEvents(jobid)
print("Fine Tuned Model Ready for Use!")
oaiChatCom = oaiChatCompletion(fine_tuned_model)
oaiChatCom.fineTunedChatCompletion(promptQuery)

GitHub Link [https://github.com/keshavksingh/oai-olympics-finetuning] contains all the necessary code for this example.

OpenAI Fine Tuning

To begin, the code carves out training and validation data in the form of 2 files

  • olympics_finetuning_training.jsonl
  • olympics_finetuning_validation.jsonl

The data files are uploaded and once the upload succeeds, we create a finetuning job. The job subsequently trains on the data adding an additional layer with the context provided on top of “gpt-3.5-turbo”.

Fine Tuning OAI

The model finetuning is complete and is now ready for inferencing.

Progress Status

LLM Fine Tuned & Ready For Use

We test this finetuned model with a positive inference.

Inferencing a Fine Tuned OAI LLM

Followed by non-contextual data to confirm a negative condition. Looks good!

Inferencing a Fine Tuned OAI LLM (Negative Testing)

At this point we have finetuned the model however we may need a RestAPI endpoint for its integration across software.

We can leverage AML’s fully managed endpoints for such.

Exposing as AML Endpoint

AML Endpoint
Deploy managed AML Endpoint
AML Endpoint
Test AML Endpoint API (Score.py)

We have demonstrated a AML Managed Endpoint to serve our needs.

Additionally, we can also use Open API (FastAPI) to create a dockerized container and push the container on Azure Container Registry (ACR) and leverage solutions such as Azure Container Instance (ACI) or Azure API App Service to serve it.

Exposing as Azure API APP Service

Azure API Service

We create a docker container.

Docker Container

Serve it on Azure API App Service.

API App Service

The app service is now ready to test.

Positive Test
Negative Test

This was a quick and neat MVP demonstration of operationalizing an OpenAI model finetuning. There is loads more to discuss on operationalization in the next interactions. I hope you enjoyed it!

References:

--

--

Keshav Singh

Principal Engineering Lead, Microsoft Purview Data Governance PG | Data ML Platform Architecture & Engineering | https://www.linkedin.com/in/keshavksingh