Generate structured output with the Gemini API
Supplying a schema as text in the prompt for accurate content generation when using Gemini AI API
Recently, while working on my project, I discovered the concept of supplying a schema as text within the prompt when using the Gemini AI API, and it’s mind-blowing. The results are much more controlled and predictable than before.
Take a look at the example provided by the docs,
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const model = genAI.getGenerativeModel({
model: "gemini-1.5-flash",
});
const prompt = `List a few popular cookie recipes using this JSON schema:
Recipe = {'recipeName': string}
Return: Array<Recipe>`;
const result = await model.generateContent(prompt);
console.log(result.response.text());
And the output looks like this,
[{"recipeName": "Chocolate Chip Cookies"}, {"recipeName": "Oatmeal Raisin Cookies"}, {"recipeName": "Snickerdoodles"}, {"recipeName": "Sugar Cookies"}, {"recipeName": "Peanut Butter Cookies"}]
In this code snippet from the documentation, the prompt is structured to guide the AI in producing consistent and predictable output. Rather than asking the AI to simply list cookie recipes, which can result in a wide range of formats, the prompt defines a specific JSON schema. By specifying that each recipe should follow the {'recipeName': string}
format and returning an array of such objects, the AI is "taught" the exact structure of the output. This method highlights the importance of schema-based prompting for reliable results in AI-generated content.
The docs also show you how to supply a schema through model configuration, in which declaring the JSON schema gives you more precise control than relying just on text in the prompt.
Take a look at the code snippet,
import { GoogleGenerativeAI, SchemaType } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.API_KEY);
const schema = {
description: "List of recipes",
type: SchemaType.ARRAY,
items: {
type: SchemaType.OBJECT,
properties: {
recipeName: {
type: SchemaType.STRING,
description: "Name of the recipe",
nullable: false,
},
},
required: ["recipeName"],
},
};
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
generationConfig: {
responseMimeType: "application/json",
responseSchema: schema,
},
});
const result = await model.generateContent(
"List a few popular cookie recipes.",
);
console.log(result.response.text());
The output is exactly the same as before, but now the schema is configured through the AI model in a more streamlined and efficient way, ensuring a sleek and consistent implementation.
Back to my project, in my current project, I’m leveraging Gemini AI to generate as much useful content as possible for automatically building a website landing page. Several sections are “fixed,” meaning that regardless of the type or category of the website being generated, these sections are essential, or there are some sections that you can expect to get when trying to generate certain types of website, eg, pricing section
, testimonial section
.
This is my first school of thought, then I start refactoring my code and writing the schema,
import { GoogleGenerativeAI, SchemaType, Schema } from '@google/generative-ai';
const testimonialsSchema: Schema = {
type: SchemaType.ARRAY,
items: {
type: SchemaType.OBJECT,
properties: {
userName: {
type: SchemaType.STRING,
description: 'reviewee name',
nullable: false,
},
review: {
type: SchemaType.STRING,
description: 'review content',
nullable: false,
},
},
required: ['userName', 'review'],
},
};
const pricingSchema: Schema = {
type: SchemaType.ARRAY,
items: {
type: SchemaType.OBJECT,
properties: {
plan: {
type: SchemaType.STRING,
description: 'the name of the plan',
nullable: false,
},
price: {
type: SchemaType.NUMBER,
description: 'price of plan',
nullable: false,
},
},
required: ['plan', 'price'],
},
};
In this schema, required
specifies the properties that must be present in each object, while nullable
determining whether a field can accept a null
value or not. This ensures flexibility in content generation, while also enforcing essential fields for each section.
Since most of the sections are still very flexible, I further guide the AI to return a certain field, this is my prompt,
`
I want to create a landing page for ${checkedQuery} with at least six sections.
Must include hero section using the predefined JSON schema:
Hero = {'heading': string, 'subheading': string, 'cta': string}
Return; Array<Hero>;
Must include Footer section only predefined JSON schema:
Footer = {'links': array, 'copyright': string}
Return; Array<Footer>;
For other sections, it should contain a heading (limited to 20 characters), subheading, body text (around 250 characters with meaningful content), and a CTA button (limited to 60 characters).
For each of the section, must have a sectionType property, either called section or using the predefined name, Hero, Testimonials, Pricing, Footer.
If a section includes Testimonials, only use testimonialsSchema for that section (exclude heading, subheading, body, and CTA). Similarly, if a section includes Pricing, only use pricingSchema for that section.
Output the data as an array without using the word json or backticks. Ensure all text generated is meaningful and relevant to the section's content.
This is the output,
As you can see, if a testimonial section
or pricing section
are generated, then the output will include testimonialsSchema
and pricingSchema
which I predefined.
That’s it. This is a brief exploration of using schema text within your prompt.