Inside Golf with Gemini: How Gemini Pro Transformed Mini-Golf with AI Storytelling

Hyunuk Lim
Google Cloud - Community
7 min readMay 23, 2024

This post is part of a series exploring the development of “Golf with Gemini,” an AI-powered mini-golf experience. You can read the first part of this series, which covers data collection and processing. Read it here.

The Art of AI Storytelling

In the previous post, we explored the journey of capturing and analyzing mini-golf data using OpenCV and BigQuery. But what truly brings “Golf with Gemini” to life is the AI-powered storytelling of Gemini on Vertex AI.

While Gemini 1.5 Pro can analyze video footage, its responses can be made even richer with structured data and detailed prompts. By combining pre-defined context, shot information, and generative AI, Gemini can create a more compelling narrative than if it relied solely on the video.

Using this context and data, I can instruct Gemini to generate a commentary for the mini-golf game, resulting in a richer and more engaging experience for the player. I designed the demo so that I could use Gemini to generate a compelling narrative based on the extracted data. It’s through my instructions and the provided data that Gemini can describe each shot with detail and insight, enhancing the overall mini-golf experience.

Guiding the Storyteller: Defining Context and Prompts

To craft effective prompts for Gemini, I needed a way to easily access and analyze the shot data stored in BigQuery. BigQuery Studio — a unified, collaborative workspace for Google Cloud’s data analytics suite that helps accelerate data to AI workflows — offered the perfect solution. Its embedded Python notebooks, powered by Colab Enterprise, were crucial to my process, which I will describe below.

These notebooks provided direct access to the shot tracking data in BigQuery, meaning I could analyze the data and create prompts without needing to export or move it. Furthermore, BigQuery Studio’s notebooks offer a collaborative environment, allowing multiple users to interact with the same notebook simultaneously. To demonstrate this collaborative feature during the demo, I ran the notebook on three Chromebooks. Any changes made to the code or prompts on my laptop were instantly reflected on all three laptops, demonstrating how BigQuery Studio facilitates real-time collaboration on data analysis tasks.

To illustrate how I guided Gemini’s commentary, here’s a glimpse of the code I used to define the prompts:

PRE_DEFINED_TEXT = """
You are a professional golf announcer and you must broadcast the match
in a formal and informative tone. You should use the following context.

- The match is 'Google Cloud Next - Minigolf Championship final',
and the venue is Mandalay Bay, Las Vegas.
- The competitor has already completed the game and if the player completes
this hole within three shots, the player wins.
- If the hole is completed over four shots, THE COMPETITOR WINS.
- Even though the competitor wins, you must broadcast the game
until the player finishes the last shot.
- You should not mention anything about the players' appearances or
personal lives.
- The broadcast must be done in colloquial language and no additional text
other than the announcer's comments
(e.g., cheers from the audience must not be included).
- The course is a rectangle measuring 7 feet by 20 feet,
and there are no obstacles or slopes on the course.
- Describe each shot in detail.
- This text will be shown in a markdown format,
so make sure to add some markdowns as you emphasize.
"""

This code snippet shows how I prompted Gemini. I specify the role of the announcer, the context of the game, and crucial details like the venue and rules. I also provide specific instructions on the desired tone, language style, and information to include. By carefully crafting these prompts, I guide Gemini to generate relevant, engaging, and informative commentary.

From Data to Narrative: Gemini in Action

Now, let’s see how I used Gemini to generate a captivating story from the raw data, based on the shot data illustrated below:

To guide Gemini’s commentary, I first needed to process the shot data into a structured summary that Gemini could easily understand. I created a Python function called generate_commentary(df) to do that.

This generate_commentary(df) function takes the raw shot data as input (df), which looks like the following dataframe:

The function then processes this data to produce a formatted string (commentary) that gives Gemini clear instructions and context. Here’s a sample output:

- Here's the analytics of each shot extracted from the video. 
Use it as a reference:
- Shot 1 started from 684.42 pixels from the hole
- Shot 1 stopped 133.33 pixels from the hole
- Shot 2 started from 127.58 pixels from the hole
- Shot 2 stopped 6.32 pixels from the hole
- The ball MADE the hole after the shot number 2.
- The measurement of the distance is in pixels. The distance is measured
from the center of the ball to the center of the hole.
- DO NOT USE pixels as a unit of measurement. USE ONLY feet and yards.
Convert pixels to feet and yards appropriately.
- After the final shot, if the remaining pixel is less than 50 pixels,
then consider it as a hole-in.

As you can see, the output summarizes each shot, indicates whether the ball went in, and provides instructions for Gemini, like converting pixels to feet and yards.

Here’s the code for the generate_commentary(df) function:

def generate_commentary(df: pd.DataFrame) -> str:
# … code to process shot data and create a structured summary …
commentary = f"""
- Here's the analytics of each shot extracted from the video.
Use it as a reference: {shot_details}
- The ball {result.upper()} the hole after the shot number {shot_number}.
- The measurement of the distance is in pixels. The distance is measured
from the center of the ball to the center of the hole.
- DO NOT USE pixels as a unit of measurement. USE ONLY feet and yards.
Convert pixel to feet and yards appropriately.
- After the final shot, if the remaining pixel is less than 50 pixel,
then consider it as a hole-in.
"""
return commentary

Once the shot data is processed, I feed it to Gemini, along with the video and the pre-defined context established earlier:

# … code to extract shot data from dataframe and generate commentary …

responses = model.generate_content(
[VIDEO, PRE_DEFINED_TEXT, generate_commentary(df)],
generation_config=generation_config,
safety_settings=safety_settings,
stream=False,
)
# … code to generate configuration and safety settings …

This code demonstrates how Gemini receives the video, the pre-defined context, and the analyzed shot data. Utilizing its generative capabilities, Gemini processes this information and crafts a narrative, transforming raw numbers into a compelling story.

Highlighting Success: A Thriving Interactive Experience

“Golf with Gemini’’ exceeded expectations at Google Cloud Next ’24, captivating attendees and engaging numerous active participants. Beyond the high participation rate, what truly excited me was the quality of Gemini’s commentary. It consistently delivered insightful and engaging narratives, accurately reflecting the game’s flow and adding a new layer of excitement for the players. Many participants remarked on how Gemini’s commentary enhanced their enjoyment of the game, making it feel more like a real sporting event.

Of course, when working with large language models, there’s always a concern about hallucinations — generating inaccurate or fabricated information. To mitigate this risk, I focused on providing Gemini with very specific instructions and structured data. By clearly defining the game’s context, rules, and desired tone, and by providing precise shot details, I was able to guide Gemini towards generating commentary that was both creative and grounded in reality.

The success of ‘Golf with Gemini’ demonstrates that compelling, AI-driven sports commentary is within reach. Imagine the possibilities: a basketball game where Gemini analyzes player stats, predicts shot outcomes, and provides historical context in real time. By combining Gemini’s capabilities with creative prompt engineering, real-time data analysis, and readily available tools like BigQuery, developers can unlock immersive and engaging sports experiences.

Learning and Iterations: Refining the Experience

The demo highlighted the importance of clear and concise instructions for Gemini to generate accurate and engaging commentary. By refining the prompts and providing specific details about the game’s context, rules, and desired tone, I was able to guide Gemini towards producing more compelling and relevant narratives. However, the quality of Gemini’s commentary depends heavily on the accuracy of the underlying data.

While the OpenCV’s CSRT tracker provided a practical starting point, it revealed limitations during the demo. For instance, if a player’s body obscured the ball, the tracker would often lose its target and fail to resume tracking. Similarly, fast swings sometimes caused the tracker to lose the ball, resulting in inaccurate shot data. These issues, stemming from the limitations of the CSRT tracker, occasionally led to Gemini misinterpreting the number of shots taken, even describing hole-in-ones when multiple shots occurred. This highlights a key takeaway for building real-world applications: accurate tracking data is essential for realistic commentary. Exploring more robust tracking algorithms, like custom-trained AutoML Vision models, could improve accuracy by handling occlusion and fast motion more effectively.

Additionally, to further enhance the experience, future implementations can incorporate features such as:

  • Real-time data visualization: Dynamically displaying player statistics and shot data alongside the AI commentary, to provide deeper insights and adding a visual component. During the event, uploading and processing the video typically took 60–90 seconds. Optimizing the data pipeline and exploring lower-latency processing options could reduce this delay in future versions.
  • Multi-language support: Expand language capabilities (potentially leveraging Gemini’s existing multi-lingual capabilities) to cater to a global audience.
  • Custom TTS (Text-to-speech): After each shot is completed, the announcer’s voice would broadcast the previous shot through a custom TTS trained with a streamlined data process.

“Golf with Gemini’’ served as a valuable learning experience, highlighting the potential of AI to transform sports and entertainment. By addressing the areas for improvement and adding new features, future versions can make the experience even more fun, informative, and accessible to everyone. And to help you tap into this exciting world of AI-powered mini-golf, I’m creating a YouTube video that will guide you step-by-step through the process of “Building your own Golf with Gemini’’. Get ready to experience the excitement firsthand!

--

--