A Guide to Tune Language Foundation Models in Google Cloud Generative AI Studio

Published in

Google Cloud - Community

9 min readJul 19, 2023

Generative AI Studio platform gives you access to Google’s large generative AI models, allowing you to test, tune and deploy language foundation models. In this post, we thoroughly explore how to tune a foundation model on the platform, leading to improved performance on specific tasks.

Whether you’re an experienced practitioner or just starting your journey in Generative AI, this guide will provide you with valuable insights and tips to tuning foundation models.

Tune language foundation models on Generative AI Studio

Generative AI Studio model tune feature for foundational models is different from the traditional concept of “fine-tuning” a model. The goal of model tuning is to enhance performance on a particular task. When a tuning process is performed, the model gains additional parameters that help encode information to carry out the intended task or learn the desired behavior without changing the foundational model.

During the tuning process, the original model is presented with a training dataset that includes various examples of the task. With just a few examples, this training step can significantly improve model performance when tuning for a specific task.

Model tuning workflow.

At the time of writing this article, the mode tune feature is currently in public preview with only text-bison model support. For more information please visit Generative AI Studio Public Preview Terms

Let’s begin this exciting journey right away, shall we?

Step 1- Prepare your model tuning dataset

We need to prepare a training dataset that showcases the task. Our dataset must be in JSONL format, where each line contains a single training example. Each example is composed of an input_text field and an output_text field :

{"input_text": "question: When did the Shiba Inu first arrive in the US? context: Shiba Inu first arrived in the US in the 1950s, but only gained American Kennel Club recognition in 1993. They have become increasingly popular not just in America but around the world for their cat-like personalities and lovable features.", "output_text": "The Shiba Inu first arrived in the US in the 1950s."}

One must take into account that the training data aligns with our expected production traffic.

The same format should be used for the production traffic to maintain consistency in our model’s understanding. In our above example, within the input_text field, “question:” is followed by “context:”. Our production traffic must adhere to the same structure for input_text.

We know we need a few task examples on our training dataset, but what does “few” meant to produce significant results? There is no definitive answer to that question, as it hinges on the task, but it is recommended to provide a minimum of 100 examples. You can refer to these recommendations for the number of examples — Recommended Configuration for Model Tune

For our working scenario, I opted to engage in a classification task. We will tune the model to classify food products into two categories based on their allergen content, distinguishing between those that contain allergens and those that do not.

The dataset we are using for the classification task is sourced from Kaggle and consists of information on food ingredients and allergens. In case of your interest, you can view it here.

The Food Allergens Dataset contains detailed information about food allergens found in a variety of foods. There are 400 records in the dataset, each corresponding to a food item and listing its associated allergens. I changed the dataset to fit the requirements for the JSONL tuning dataset, and it looks like this:

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Almond Cookies, Main Ingredient:Almonds, Sweetener:Sugar, Fat[oil]:Butter, Seasoning:Flour", "output_text": "Contains"}
{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Chicken Noodle Soup, Main Ingredient:Chicken broth, Sweetener:None, Fat[oil]:None, Seasoning:Salt", "output_text": "Contains"}
{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Chicken Noodle Soup, Main Ingredient:Chicken broth, Sweetener:None, Fat[oil]:None, Seasoning:Salt", "output_text": "Contains"}
{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Cheddar Cheese, Main Ingredient:Cheese, Sweetener:None, Fat[oil]:None, Seasoning:Salt", "output_text": "Contains"}
{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Ranch Dressing, Main Ingredient:Buttermilk, Sweetener:Sugar, Fat[oil]:Vegetable oil, Seasoning:Garlic, herbs", "output_text": "Contains"}
.
.
.

A Google Cloud Storage bucket must be created for uploading the training dataset before or during the tuning process.

Step 2- Create a Tuned Model

We can start the model tuning process through the Google Cloud console, API, or the Vertex AI SDK for Python. Our go-to tool for this example, will be the Google Cloud console. We’ll later cover how to invoke our tuned model from a colab notebook.

To use Vertex AI features in your project for the first time, you must enable the Vertex AI API. Then Go to the Dashboard on Vertex AI and click the “Enable all recommended APIs” button

Within the Generative AI Studio tool, navigate to Language and select “Tune a model.”

Vertex AI — Generative AI Studio / Language

You can either upload the JSONL tuning dataset to an existing bucket or create a new one by selecting the “Upload JSONL file to Cloud Storage” option.

In the Model details section, you can name your tuned model, and keep in mind that “text-bison” is the only option for the base model(for now). Train steps recommendations for classification task are 100–500 steps.

For the training steps I decided to go with 400 steps and leave the “Learning Rate” field with the default value.

Once you click Start Tuning, a pipeline job would be created.

To check the status of your model tuning job, click the link under the Pipeline run column, this link will take you to Vertex AI Pipelines page. The pipeline run page appears and displays the pipeline’s runtime graph.

For now tuning can only happen in the “europe-west4” region and the new model can only be deployed in the “us-central1” region

The pipeline summary shows the basic information for pipeline steps, this information includes execution details, the input parameters that were passed to the step, and any output parameters that the step passed to the pipeline.

Our model tuning process requires thorough attention and careful execution so it will take some time to finish. You can decide to continuously monitor or Alternatively, you can configure email notifications for Vertex AI Pipelines so you are notified by email when the model tuning job finishes or fails.

To view your tuned models in the Google Cloud console, go to the Vertex AI Model Registry page.

Step 3 — Use your Tuned Model

The tuned model is ready, and now it’s time to use it. Navigate to the Language section of Generative AI Studio where you can craft a new text prompt. Our model will be available to you to choose under the Model dropdown selection. We will explore the option of calling our model with the Python Vertex AI SDK.

If you’re unfamiliar with text prompts, Generative AI Studio documentation is an excellent resource to learn more about this remarkable feature.

Choosing our model is just the beginning. There are other configuration parameters to consider. Click the question mark for a comprehensive explanation of each parameter, and if that’s not enough, the Introduction to prompt design documentation will offer you additional insights.

Let’s try our model with some examples.

Roughly 20% of the total data from the Food Allergens dataset was separated into a “testing” subset. We will use the following examples from those records for our next text prompts:

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Chocolate Chip Pancakes, Main Ingredient:Flour, Sweetener:Sugar, Fat[oil]:Butter, Seasoning:Chocolate chips", "output_text": "Contains"}

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Chicken Biryani, Main Ingredient:Chicken, Sweetener:None, Fat[oil]:Ghee, Seasoning:Basmati rice, spices", "output_text": "Does not contain"}

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Hawaiian Pizza, Main Ingredient:Pizza dough, Sweetener:None, Fat[oil]:None, Seasoning:Pineapple, ham", "output_text": "Contains"}

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Chocolate Chip Pancakes, Main Ingredient:Flour, Sweetener:Sugar, Fat[oil]:Butter, Seasoning:Chocolate chips", "output_text": "Contains"}

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:Beef Wellington, Main Ingredient:Beef, Sweetener:None, Fat[oil]:Butter, Seasoning:Mushrooms, puff pastry", "output_text": "Contains"}

{"input_text": "Given the following Food Product information classify it into one of the following classes: [Contains, Does not contain] allergens Food Product:", "output_text": "Does not contain"}

As our model is now optimized to detect allergens in food products, we refrained from passing any examples from the UI and we’ve lowered the Temperature parameter. The reason behind this is that lower temperatures are best suited for classification tasks requiring concrete, non-creative responses. To produce the most accurate results, low temperature and low top-K values are recommended because of the task’s deterministic nature.

Take into account: The way you construct your prompt can influence the model towards producing the intended result (or not)

We can enhance our text prompt by providing a list of food allergens instead of individual ones.

The output from our model matches the expected results based on our previous test examples. Our next step is to recreate these examples using the Python Vertex AI SDK.

From your text prompt you have a “<>VIEW CODE” option on the top right side. After selecting the view code option, select “PYTHON COLAB”, and you will be shown a sample Python code to use with a colaboratory notebook.

Using the sample code as a starting point, I added some additional lines and created a colab notebook. To illustrate its functionality, I included the last four records we tested before.

Below is the full code from the previous notebook.

!pip install google-cloud-aiplatform
!pip install "shapely<2.0.0"

import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

from google.colab import auth as google_auth
google_auth.authenticate_user()

from google.colab import auth as google_auth
google_auth.authenticate_user()

import vertexai
from vertexai.preview.language_models import TextGenerationModel

vertexai.init(project="1055236240165", location="us-central1")
parameters = {
    "temperature": 0.1,
    "max_output_tokens": 256,
    "top_p": 0.8,
    "top_k": 3
}
model = TextGenerationModel.from_pretrained("text-bison@001")
model = model.get_tuned_model("projects/1055236240165/locations/us-central1/models/546019673376817152")
response = model.predict(
      """Given the following list of food products information classify each one of them into one of the following classes: [Contains, Does not contain] allergens .Give results in JSON with Food product and Allergens values.

list [
input: Food Product:Hawaiian Pizza, Main Ingredient:Pizza dough, Sweetener:None, Fat[oil]:None, Seasoning:Pineapple, ham
input: Food Product:Chocolate Chip Pancakes, Main Ingredient:Flour, Sweetener:Sugar, Fat[oil]:Butter, Seasoning:Chocolate chips
input: Food Product:Beef Wellington, Main Ingredient:Beef, Sweetener:None, Fat[oil]:Butter, Seasoning:Mushrooms, puff pastry
input: Food Product:Greek Lemon Potatoes, Main Ingredient:Potatoes, Sweetener:None, Fat[oil]:Olive oil, Seasoning:Lemon juice, herbs
]
""",
    **parameters
)
print(f"Response from Model: {response.text}")

Summing up, with Google Cloud Generative AI Studio model tuning, even those with limited AI expertise can easily improve the performance of a foundation model for particular tasks in just a few steps. Being a part of the journey during the product’s public preview is an exciting time, and providing feedback can make the experience even more rewarding.

Thank you for reading and happy model tuning!