Building Out 🍨 Ice Cream 🍦 Product Assets at Scale with Gemini

Alok Pattani
Google Cloud - Community
9 min readAug 28, 2024

In the realm of modern retail, where digital experiences reign supreme, the quality of product assets in an online catalog holds the key to enhancing user engagement. Imagine working at a major consumer products retailer, with thousands of products, each with images, descriptions, and a wealth of customer reviews in a data lake.

You might wonder: can generative AI help in crafting more captivating assets that will enrich the user experience? The answer is an emphatic yes! Let’s see how generative AI can help transform our product catalog to be more readable, accessible, and useful — at scale, and without leaving our data environment!

Ice Cream!

For this example, we’ll use this well-curated Kaggle dataset of 240+ ice cream products — including images, descriptions, and reviews — from four different brands: Ben & Jerry’s, Häagen-Dazs, Breyers, and Talenti. (Coming from someone who tracks a lot of data about every ice cream place I’ve been to, this choice should come as no surprise 🙂.)

Our goal is to create three new pieces for each product:

  • a shorter and more consistent description
  • alternative text (alt text) to convey the same essential information as the product image
  • a summary of reviews to help users understand customer feedback more quickly

If you want to skip ahead to the results, feel free to peruse this Looker Studio dashboard, which features an interactive table with all 241 products and their various outputs generated by Gemini, Google’s multimodal foundation model.

Screenshot of Ice Cream Product Assets Looker Studio dashboard

To learn how we got there and see how you can replicate the process yourself, read on!

Setup

To get started, we moved two CSV files and all the image files in the “combined” section of that Kaggle dataset into the type of environment we might have for this ice cream data at our company, namely:

  • a products table and a reviews table in BigQuery, Google Cloud’s fully managed AI-ready data analytics platform
  • a Cloud Storage bucket with all product images, with file names that are referenced within the BigQuery tables

We ended up a BigQuery dataset named “ice_cream_products” that contains the two tables, with schemas as shown below:

Schemas for ice cream products and reviews tables in BigQuery

The Cloud Storage bucket with the images is publicly available and looks like this in the Cloud console:

Cloud Storage bucket with ice cream product image files

It’s straightforward to get things like average star ratings by product and such using SQL, but we are aiming for outputs that go beyond our common analytics workflows, e.g. shorter summaries from longer-form text and new text from image files. Traditionally, this would require moving the structured data out of our database and images out of our lake, perhaps reading them into a Jupyter notebook or Python script, running them through the models or APIs of interest, and then outputting the results back to our original environment.

But what if we could just call Gemini models like we do other SQL functions, and have them operate on image files — right in our existing data warehouse?

This major workflow simplification is enabled by BigQuery’s integration across the entire Google Cloud platform, including with Vertex AI, Google Cloud’s unified AI development platform. This enables analysts to query unstructured data (like images in Cloud Storage) and also access Vertex AI remote models and functions — including the Gemini API on Vertex AI.

To set this up, we followed these instructions to create a Cloud resource connection in BigQuery called “vertex-connection” and grant the associated service account access to Cloud Storage and Vertex AI.

Next, we can create a remote model in BigQuery over Vertex AI models, specifically for Gemini 1.5 Flash — a great multimodal model for speed, efficiency, high volume, and cost effectiveness. Following the example here:

CREATE OR REPLACE MODEL `ice_cream_products.gemini-1dot5-flash`
REMOTE WITH CONNECTION `us.vertex_connection`
OPTIONS (endpoint = 'gemini-1.5-flash');

To be able to analyze image files in BigQuery, we need to create an object table that contains the images in our Cloud Storage bucket. Following the documentation here:

CREATE OR REPLACE EXTERNAL TABLE `ice_cream_products.product_images`
WITH CONNECTION `us.vertex_connection`
OPTIONS (
object_metadata = 'SIMPLE',
uris = ['gs://ice_cream_products/images/*']
);

This gets us set up for creating each type of new asset we want. From this point forward, we won’t show all the code details behind each step, but rather point you to this GitHub repository with the BigQuery SQL code used to create and output the new assets for all products.

Creating Consistent Short Product Descriptions

An example of short product description generation using Gemini

First, we want to create shorter product descriptions to fit better into integrations where space is at a premium. Also, the provided descriptions from the four different brands’ websites (see original product descriptions in the “Single Product Overview” element toward the bottom of the dashboard) can be somewhat different in structure, tone, and length. So it would be ideal if we can both shorten the product descriptions and make them a bit more consistent in formatting across products.

We prompt Gemini to generate these descriptions with some very specific instructions (full prompt here):

“You are an ice cream product short description creator, who takes in names, subheadings, and long descriptions of ice cream products and shortens them for use in apps/websites, on social media, etc.

[Some other instructions]

The response should be no more than 20 words, and can just be a phrase (no need for sentences). Do not use the ice cream product’s brand or product name in the description, just words that illustrate what this product is and how it’s unique to ice cream-loving customers.

Product name: {name}

Subhead: {subhead}

Product Description: {description}

”

…where the name, subhead, and description are filled in accordingly to get a different prompt specific to each product.

The actual call to Gemini is made using the ML.GENERATE_TEXT function, which references the Gemini 1.5 Flash model created above and points at a table with a field named “prompt” that contains those product-specific prompts on each row. Here’s a snippet of the code for that piece (see more here):

ML.GENERATE_TEXT(
MODEL `ice_cream_products.gemini-1dot5-flash`,
(
SELECT
ProductsWithInfo.* EXCEPT (product_short_description_prompt),
product_short_description_prompt AS prompt

FROM
ProductsWithInfo
),
STRUCT(
0 AS temperature,
60 AS max_output_tokens
)
)

Flash generates these descriptions for all of our ice cream products in less than two minutes! You can see the full results in the dashboard.

One good example is for Breyers No Sugar Added Caramel Swirl, where it takes the product’s original 150+ word text and turns it into the short description of “Vanilla ice cream with salted caramel swirls and toffee pieces, no added sugar.” That’s pretty efficient: telling you what’s inside in less than 15 words!

Creating Alt Text from Images

An example of image alt text generation using Gemini

Next, we aim to create alt text to help with situations where the product image is not available to the user, perhaps because they don’t have images in their web/app interface or they are using a screen reader due to a visual impairment. Creating such text manually would be time-consuming at scale, so we can take advantage of Gemini’s multimodal capabilities to understand each image and caption appropriately.

The alt text prompt template is similar to but shorter than the one above, and references the product image only instead of the various text fields to be more true to the alt text paradigm (see full prompt in the repo). Gemini 1.5 Flash, this time running over our BigQuery object table pointing to images on Cloud Storage, generates alt text for each image we have in less than two minutes.

Checking some of the outputs, Gemini does reasonably well describing the different types of images by brand. For example, it’ll say something more specific like “A pint of Ben & Jerry’s Coffee Coffee Buzz Buzz Buzz! ice cream.” when looking at text-heavy carton images, while returning something like “Two scoops of rich chocolate ice cream in a white bowl.” for brands providing much more generic images.

Big picture, using Gemini to create alt text in instances like this where it may not exist can be a big win for various accessibility goals you or your company may have.

Summarizing Product Reviews

An example of product review summarization using Gemini

We know that consumers often rely on ratings and reviews from other humans trying a product in making their decisions. One issue that can arise is that there may be too many reviews, some of which are lengthy, for a consumer to be able to read efficiently — e.g. a few of the products in our data have several hundred reviews each.

To help with this, we concatenate each product’s set of reviews into one long string (per product), and rely on Gemini’s summarization ability across long context — including reviews from all sorts of different voices for the same product — to help us get one review summary per product. We ask for Gemini to emphasize feedback mentioned across multiple reviews, and respond with bullet points that take up no more than 75 words (see full prompt in the repo).

Gemini 1.5 Flash delivers all these summaries — looking across many thousands of words — in about a minute! Here’s an example summary of nearly 900 reviews for Ben & Jerry’s Half Baked®, one of my all-time favorite flavors:

“* This ice cream is a combination of chocolate and vanilla ice cream with brownie and cookie dough chunks.

* Many reviewers mention the brownie and cookie dough chunks being a highlight of the product.

* Some reviewers have noticed a decline in the quality of the brownie and cookie dough chunks, describing them as dry, chalky, and crumbly.

* Several reviewers mention that this is their favorite ice cream flavor.

* Some reviewers have experienced pints with very few or no brownie or cookie dough chunks.”

The summary appears to do a nice job collecting opinions prevalent across multiple reviewers and capturing both positives and negatives from them. Once again, play with the dashboard to see review summaries for all ice cream products.

Putting It All Together

While we went through each piece sequentially, we can actually leverage BigQuery’s scripting capabilities and the Vertex AI connection to generate all of the LLM outputs and merge them together all within one SQL script, as you can see here. An advantage of having the results live in BigQuery is that they can be pretty directly surfaced in a dashboard like the one here, which is great for exploring and debugging outputs. (In the real world, these results would eventually likely be surfaced on a more polished front end like the company’s website, app, etc.)

The ability to seamlessly call an LLM with a parametrized prompt over every row of a BigQuery table is extremely powerful. In this case, it enabled us to get three distinct new Gemini-generated outputs — short descriptions, alt text, and review summaries — for 240+ products in a matter of a few minutes, using only SQL.

Get started with your own journey of applying generative AI to data and images in tables by checking out the SQL code for this post or this poster image analysis Jupyter notebook also showcasing how to use Gemini models in BigQuery. Unleash your creativity and discover the amazing things you can achieve with the combined power of BigQuery, Vertex AI, and Gemini today!

--

--