Poro-34B’s LoRA fine-tuning with publicly available S Group data
Customization of Large Language Models (LLM) have become increasing popular. With a small additional training or fine-tuning, their performance can increase significantly. In particular, this may be of interest to companies that want to develop their own LLM model that understands company data. In this article, we show how to fine-tune the Poro 34B LLM model using publicly available S Group data. Fine-tuning is one of the first steps in the LLM model deployment. S Group is a customer-owned Finnish network of companies in the retail and service sectors, with approximately 2,000 outlets in Finland.
There is no one way to do the “right” fine-tuning. The final result depends on the model; the weights of its base model and the data used in the base model training. It depends on the algorithm and hyperparameters used for fine-tuning. It depends on the format of the fine-tuned data (prompts). And many other things. How to choose the right way of doing the fine-tuning? It’s a combination of intuition, knowledge and experience. You need to learn and know the model. Here we show one possible approach to fine-tuning. If I were to do the same fine-tuning a second time, I would do some things differently, but the tweak below works and can be used as a starting point.
Poro LLM is a 34B parameter decoder-only transformer pretrained on Finnish, English and code. It bases on the Bloom architecture. The latest fully trained checkpoint was just recently released on Huggingface. As an example of fine-tuning, we use S-Kanava’s “Frequently Asked Questions”. In total, these are 185 questions with answers. The data set is modest, but gives an idea of how fine-tuning is done, the resources and costs needed to develop a fully functional LLM with a comprehensive data set. Full fine-tuning of the Poro 34B is resource-intensive, and that’s why we’ve used the so-called PEFT (Parameter-Efficient Fine-Tuning) and LoRA (Low-Rank Adaption) techniques. In PEFT, most of the parameters (weights) of the pre-taught LLM are frozen and fine-tuning is done for a small subset. LoRA is a subset of PEFT. The basic idea is to design low-rank matrix that is added to the original matrix. An adapter is a collection of low-rank matrices that, when added to a base model, produce a fine-tuned model. The accuracy isn’t as good as full fine tuning, but it can often be good enough and this training approach it’s significantly cheaper compared to many others.
Fine tuning was done with Amazon Sagemaker. All other public cloud service providers can serve the same capabilities. A GPU-powered instance is required for fine-tuning to run smoothly. We tested with a ml.p3.8xlarge instance with 244 GiB memory and 4 NVIDIA V100 GPUs for a total of 64 GiB GPUs, but this was not enough. That’s why we choose the instance ml.r5.24xlarge. It doesn’t have a GPU, but it has 96 vCPUs and 768 GiB of memory. Fine-tuning requires at least about 200 GiB memory. With this setting, fine-tuning takes a little longer, but when we have a small data set, it doesn’t matter that much.
Training data
The fine-tuning training data contains 185 question and answer pairs (from S-Kanava).
This data is converted to .json format using the Preparedata.ipynb script (this script and all other codes can be found from the Huggingface repository).
Sagemaker setup
The prerequisite is that a user profile has been created for Sagemaker. Then go to the Amazon Sagemaker home page, select User Profile, and open Sagemaker Studio. Create a JupyterLab with a ml.r5.24xlarge instance and 100 GB of storage space.
Start the instance and wait for it to start. Then press “Open JupyterLab”. Download the Poro34B-*.ipynb and prompts*.json files from Huggingface. The Sagemaker directory structure looks like this:
Then we are ready to execute the notebook cells.
First training iteration
Every LLM model is different. Poro-34B is a new model whose features are not yet well known. That’s why we start with one practice question and see how fine-tuning works, what is computation performance and output accuracy. We use the notebook Poro-34B-Lora-1QA.ipynb.
First, the Python environment is initialized. AWS comes with a comprehensive default Python 3 environment, and only the peft package needs to be installed additionally. The used environment can be found from the requirements.txt -file.
Then we import the Poro-34B model.
The LoRA configuration can be seen below.
Note that if there were 34B parameters to begin with, LoRA only touches 4% of them.
The training data is read from the json file and processed into a usable format.
The we set up the training parameters and make the training.
We make 20 fine-tuning training iterations. Training loss decreases, which looks promising. This run took about 40 minutes.
Then we save the Lora parameters.
Finally we are ready for testing. We give a trained question as a Prompt for the original Poro-34B model and the fine-tuned LoRA model. The question and the answer we want to ask is
The results are shown below.
As we can see, the fine-tuned model gives the correct answer. It gives the same answer multiple times, but this is common in LLM models. Tuning hyperparameters and, for example, adding a repetition penalty can be used to remove extra lines.
As a result we can conclude, the first training iteration worked well and we can now add complexity.
Second training iteration
Next, we fine-tune the base model with two questions. We want to find out how this affects training loss, computation time and model performance.
Now we use the notebook Poro-34B-Lora-2QA.ipynb. Many steps are identical compared to the previous case. Training loss is now
Again, with 20 steps, the loss decreases nicely. The fine-tuning task takes about 50 minutes.
We now have two training question-answer pairs: the same as before and the other is
Base and fine-tuned mode testing gives
Looks good! We are ready to try the full training data set.
Third training iteration
Finally, we test the fine-tuning of all 185 question-answer pairs. The notebook is Poro-34B-Lora-185QA.ipynb. The execution of 20 steps take about 9–10 hours and the training loss is
The computation took place at night, the AWS console became idle and therefore we only have the first 17 losses visible. However, it can be seen from the training loss that although it decreases during the iterations, it is larger at the beginning. Therefore, we can expect that the answers may be incomplete.
We will randomly select two research question-and-answer pairs from the full list of 185. These are
and
Next we verify what the models give.
We can see that the fine-tuned responses are closer to the desired responses, but still not quite what we are looking for. To achieve better performance, we should do more training iterations.
Conclusions
We have shown with examples how to LoRA fine-tune Poro-34B LLM model. The fine-tuning works and gives accurate completions.
I encourage everybody to experiment, test and improve.