Fine-Tuning on VertexAI

Published in

BosphorusISS

6 min readJan 24, 2024

Hi, I am Mine. We were building a AI-Assistant for our Handbook to help new joiners. In the first part, we implemented the RAG pattern and actually achieved a really good end result. You can find details on previous post , go here.

But how to improve this AI-Assistant for production level? AI has to behave in a certain way; answers should be 'formal'. AI shouldn't answer questions that are out of the context we provided. Of course, we can achieve this by giving instructions, as we know from prompting.

We’ were using text-bison-001 as our LLM, and we will fine-tune our model with Vertex AI. But before that, I want to start with how we guide LLMs through prompt engineering. After that, you will see why we need to fine-tune rather than just prompting.

Prompt Engineering 101

Prompt is basically, providing context within user prompts to guide LLMs to answer correctly. There are 3 different methods: zero-shot, one-shot and few-shot prompting.

Zero-Shot Prompting: Rellying on LLM’s pre-training knowledge.
One-Shot Prompting: Giving one example to the LLM
Few-Shot Prompting: Giving few example to the LLM

However, we can’t give hundreds of examples; every LLM has a context window, the maximum number of tokens that the model can take as input. As you can see, text-bison’s input tokens are currently 8192, and it won’t be enough.

Also, it’s not a clever way to do it for your cost. For each question (aka API call), Google counts your input tokens and charges you based on the number of input tokens + output tokens. Output tokens are AI’s answers; you can limit this on your API call, of course. But the problem here is if your prompt is too long, you pay more.

The RAG architecture already causes a big prompt because of the context that we provide to the LLM. If we start giving examples too, the prompt will get bigger too. So when you have a fat prompt, your cost is fat too.

Here comes fine-tuning.

How model tuning works?

Model tuning works by providing a training dataset containing many examples to the model. The training dataset should be structured as pairs of input and output examples. After you have prepared your dataset, you start a training job. When it finishes, you deploy your fine-tuned model and move from there.

Vertex AI supports the following methods to tune language models: Supervised and RLHF tuning.

Supervised Learning : Supervised tuning improves the performance of a model by teaching it a new skill. Data that contains hundreds of labeled examples is used to teach the model to mimic a desired behavior or task. Each labeled example demonstrates what you want the model to output during inference.

You can fine-tune chat and text models . For code models supervised tuning is the only option.

Reinforcement learning : Reinforcement learning from human feedback (RLHF) uses preferences specified by humans to optimize a language model. By using human feedback to tune your models, you can make the models better align with human preferences and reduce undesired outcomes in scenarios where people have complex intuitions about a task.

You can use reinforment learning on text models and Flan-T5 models.

For Vertex AI docs, go here.

I will start with supervised learning and see its impact. Also, it’s relatively easy to prepare a training dataset. After we launch the demo, we can start collecting human feedback and train LLM with reinforcement learning.

Use cases for using supervised tuning on text models :

Classification: The expected response is a specific word or phrase.
Summarization: The summary follows a specific format.
Extractive question answering: The question is about a context and the answer is a substring of the context.
Chat: to follow a persona, role, or character.

Lets start to fine-tune text-bison via supervised learning.

Tuning Guide

The model tuning workflow is like this:

Prepare a dataset-> Import it to Google Storage -> Start a tuning job by giving related params -> Deploy your model to a Vertex AI Endpoint.

Dataset Preparation

Dataset format must be JSONL, containing two basic pieces of information.

{"input_text": "How should I add a commit message?", "output_text": "Hi! Of course, I can help you with that. Here's how you can add a commit message:1. Create a branch and commit your changes.2. Use the imperative mood in the summary line......"}
{"input_text": "How is the weather in Istanbul?", "output_text": "I am only here to help you with questions about BISS"}

*format is changing for chat-bison

Start a Traing Job

On your GCP Vertex AI Dashbord, find Vertex AI Studio on your left menu and go to Language page.

From here, go to Tune and Distill tab and click the Create Tuned Model, fill the information for Step 1

On the second step , you will select your dataset from Cloud Storage or upload directly from here. There is no other option than Cloud Storge for dataset.

To see details for each setting, go here you canand click on the Console tab.

And after you click Start Tuning it will create you a pipeline, more or less something like this. You can watch your pipeline from Pipelenes page , find it on left menu.

After your pipe is finishes, you will have a registered model which you’ll be able to see on the “Model Registry”

And from the “Deploy&Test” tab, your can deploy your fine-tuned model to an endpoint but tuning pipeline had already deployed it for you, go to Online Predictions page.

You can find your model-id on Version Details tab

How to use a fine-tuned model?

For Langchain:

from langchain.llms import VertexAI
  
def tunedLLM():
        return VertexAI(
            model_name="text-bison@001",
            tuned_model_name="projects/{your-project-id}/locations/{your-model-region}/models/{your-tuned-model-id}",
            max_output_tokens=1024,
        )

 def LLM():
        return VertexAI(model_name="text-bison@001", max_output_tokens=1024)

For PalmAPI:

from vertexai.preview.language_models import TextGenerationModel

model = TextGenerationModel.get_tuned_model({TUNED_MODEL_NAME})

With tuning we can achive better results because of our model knows how to give a response based on hundreds of examples, However, we can’t feed our model that amount of examples using only prompt. For me fine-tuning is a must for task speciality.

Hope it helped, see you on the next adventure!