Part III (Fine-Tuning) — Beyond the Buzz: Highlighting the Impact of AI in Modernizing Application

Fine-tuning LLM Model

Now that we’ve successfully created a working POC in Part II with some helpful tweaks, it’s time to dive into the importance of Fine-tuning. Similar to Part II, I had a set of goals for this phase, and here they are:

  1. Make sure the final result runs smoothly on my MacBook.
  2. Align the training dataset with the XBC scenarios discussed in Part II.
  3. Keep the costs of fine-tuning to a minimum.

To achieve these goals, I followed these key steps:

  1. Build XBC Training Dataset
  2. Fine-Tune Llama2 7B Chat Model
  3. Convert the Fine-Tuned Model to GGUF Format for local execution
  4. Integrate the Fine-Tuned Model with XBC Application

In the upcoming sections, we’ll delve into each of these steps, providing a detailed guide to ensure a smooth and successful Fine-tuning process.

Build XBC Training Dataset:

To create the training dataset, I started by understanding the Llama2 chat model prompt template, which I found in this blog. Using this blog as my guide, I crafted 175 different training prompts covering scenarios like transferring money, subscribing, and adding or removing users to account as explained in Part II. No surprises, I utilized my existing Llama2 setup to generate prompts for these cases and employed Google Sheets functions to structure the template. For instance, the following prompt illustrates adding Allan to the account

<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information. <</SYS>> Would you be open to attaching Allan to my account? [/INST] The addition of Allan to your account is complete! </s>

The training set is available in the Git repository as train.csv file.

Fine-Tune Llama2 7B Chat Model:

For the fine-tuning process, I utilized the Google Colab notebook created by Maxime Labbone, which is explained in detail in this blog. To tailor the script to our needs, I modified it to use our training dataset in CSV format instead of the sample dataset originally included. I also extended the training epochs to 3 for more comprehensive learning.

Here are some key points and configurations worth highlighting:

  1. We applied the Supervised Fine-Tuning (SFT) technique, training the model on datasets containing instructions and responses (our train.csv).
  2. For efficient and resource-conscious fine-tuning, we employed Quantized Low-Rank Adaptation (QLoRA), a method that introduces a smaller number of new weights into the model for training.

To ensure faster and consistent execution during training, I opted for a Google Colab Pay As You Go compute subscription. Despite several rounds of trial and error to align the script with the XBC training dataset, the overall training cost remained below $2 USD. Once the training was successful, I uploaded the fine-tuned model to my Hugging Face repository.

Image 1: Fine-tuned model from Hugging Face Repository

You can directly use the fine-tuned chat model from my Hugging Face repository, If you would like to skip the fine tuning step.

Convert the Fine-Tuned Model to GGUF Format for local execution:

As the next step, the fine-tuned model needs to be converted to GGUF format for local execution with llama.cpp.Llama.cpp simplifies this process with its covert script (convert.py). I generated two versions of the GGUF file — a lightweight quantized version ideal for local execution and another with higher quality. Both converted models are now accessible in my Hugging Face repository, and you can locate them here.

Integrate the Fine-Tuned Model with XBC Application:

Now, we’re entering the final phase — integrating the fine-tuned model into our XBC Application and witnessing it in action. To achieve this, I crafted a new app named app-ft.py, implementing two crucial changes:

  1. Enabling the app to make use of the fine-tuned model.
  2. Eliminating the post-processing hack from Part II to observe the fine-tuning impact.

With these adjustments, I tested various prompts covering transfer, subscribe, add, and remove scenarios discussed in Part II. The fine-tuning demonstrated impressive results, producing LLM responses of notably improved quality and alignment compared to the initial chat model. Consequently, the post-processing hack can now be safely retired.

Key learnings:

  1. Fine-tuning is a crucial step to ensure LLMs are suitable for enterprise use.
  2. The success of fine-tuning relies heavily on having a top-notch training dataset, emphasizing the need for creating high-quality training data.
  3. Be cautious of overtraining or overfitting issues; it’s wise to keep the number of training epochs relatively low, ideally around 5.
  4. While one initial fine-tuning might be enough for many enterprises, it’s advisable to conduct periodic fine-tuning, like once every quarter, incorporating new insights from LLM responses, including potential hallucinations.

Try it Yourself !

If you would like to give it a spin yourself, head over to my GitHub repository at https://github.com/AhilanPonnusamy/LLM-and-AppModernization/tree/main for a detailed guide and all the source code magic. Enjoy the journey, and I’m eagerly looking forward to hearing about your experiences!

References:

Just like in Part II, I found it difficult to find a single reliable source to make this work end to end. However, the following are some of the documents I relied heavily on from ideas to baseline scripts

--

--