Vertex AI Controlled Generation with Gemini

Respond reliably with JSON and other formats

Sascha Heyer
Google Cloud - Community
4 min readJun 26, 2024

--

Have you ever been in a situation where you begged the LLM to return a valid JSON? Those times are finally over. Controlled Generation, or JSON mode as some call it, will make this a thing of the past.

LLMs responses are typically used in downstream tasks. It is common to return a more structured response. We must ensure the LLM responses consistently with valid JSON.

The Gemini API provides capabilities to control the model output to a specific format. JSON and enum are publicly available, and more formats, including custom formats, will follow.

During the IO conference, Google announced Controlled Generation together with more features like context caching caching.

Jump Directly to the Notebook and Code

All the code for this article is ready to use in a Google Colab notebook. If you have questions, don’t hesitate to contact me via LinkedIn.

Usage

Google added two additional parameters to GenerationConfig.

  • response_mime_type
    You can use either the response_mime_type standalone or combined with response_schema.
  • response_schema
    It is a struct object that can be used to define the response format more programmatically.
generation_config = GenerationConfig(
temperature=1.0,
max_output_tokens=8192,
response_mime_type="application/json",
response_schema=_RESPONSE_SCHEMA_STRUCT
)

Since September 5th, Gemini 1.5 Pro and Flash fully support Controlled Generation.

Full example based on a recipe prompt:

import vertexai
import json
from vertexai import generative_models
from vertexai.generative_models import GenerationConfig, GenerativeModel

# Define your response schema structure (_RESPONSE_SCHEMA_STRUCT)
_RESPONSE_SCHEMA_STRUCT = {
"type": "object",
"properties": {
"recipe_title": {
"type": "string",
"description": "The recipe title."
},
"recipe_description": {
"type": "string",
"description": "The recipe description."
},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the ingredient."
},
"quantity": {
"type": "string",
"description": "The quantity of the ingredient."
},
"unit": {
"type": "string",
"description": "The unit of measurement for the ingredient."
}
},
"required": ["name", "quantity"]
}
}
},
"required": ["recipe_title", "recipe_description"]
}

def generate():
vertexai.init(project="sascha-playground-doit", location="us-central1")
model = GenerativeModel("gemini-1.5-pro-001")

generation_config = GenerationConfig(
temperature=1.0,
max_output_tokens=8192,
response_mime_type="application/json",
response_schema=_RESPONSE_SCHEMA_STRUCT
)

responses = model.generate_content(
["generate a recipe"],
generation_config=generation_config,
stream=False,
)

generation = responses.candidates[0].content.parts[0].text
print(generation)


generate()

And here is another enum example for you. This is particular useful for classification use cases. See how simple the prompt is we don’t even instruct what we want to achieve we simply defined the output structure and provide the input as a prompt.

import vertexai

from vertexai.generative_models import GenerationConfig, GenerativeModel

vertexai.init(project="sascha-playground-doit", location="us-central1")

model = GenerativeModel("gemini-1.5-pro")

response_schema = {"type": "STRING", "enum": ["high", "medium", "low"]}

prompt = """
The server is down, and our entire team is unable to work.
This needs to be fixed immediately
"""

response = model.generate_content(
prompt,
generation_config=GenerationConfig(
response_mime_type="text/x.enum", response_schema=response_schema
),
)

print(response.text)

For the geeks

I usually look for pull requests within the Vertex AI SDK to understand how new features work before the documentation is released. This pull request was created for Controlled Generation at the end of May 2024: https://github.com/googleapis/python-aiplatform/pull/3772.

This helped me reverse engineer the usage, and it’s always fun to better understand the SDKs we use frequently.

Since June 25th, it has been officially released, and there is excellent documentation.

Conclusion

Controlled Generation with Gemini API represents a significant leap forward in ensuring the reliability and consistency of LLM responses, especially when structured formats like JSON or enum are required. Developers no longer need to struggle with getting valid JSON from their models. With Gemini, the output can be tightly controlled and formatted as required.

Now you can create reliable recipes and much more =)

Generated with Imagen

Thanks for reading

Your feedback and questions are highly appreciated. You can find me on LinkedIn or connect with me via Twitter @HeyerSascha. Even better, subscribe to my YouTube channel ❤️.

--

--

Sascha Heyer
Google Cloud - Community

Hi, I am Sascha, Senior Machine Learning Engineer at @DoiT. Support me by becoming a Medium member 🙏 bit.ly/sascha-support