Inside Golf with Gemini: From Pixels to BigQuery with OpenCV

Hyunuk Lim
Google Cloud - Community
6 min readMay 23, 2024

This post is part of a series exploring the development of “Golf with Gemini,” an AI-powered mini-golf experience. The next part in this series will dive into the AI storytelling of Gemini. Read it here.

Google Cloud Next — Minigolf Championship Final: The Deciding Moment
Welcome back to the Mandalay Bay in Las Vegas, where the tension is palpable as we enter the final stages of the Google Cloud Next — Minigolf Championship final! Our competitor has already completed their round, setting a high bar for our remaining player. It all comes down to this: a single hole with championship glory on the line. […]

Second Putt: This is it, the moment of truth. A successful putt here means victory, while anything else spells defeat. The player takes a deep breath, focuses, and executes a delicate stroke. The ball glides across the green, seemingly drawn to the hole as if by destiny… IT’S IN! […]

You just witnessed the climax of “Golf with Gemini”, an interactive AI-powered mini-golf experience that I showcased at Google Cloud Next ’24, which uses Gemini to automatically provide commentary on a player’s game. Through this project, I learned a lot about AI and Cloud technologies, while creating an experience that went beyond typical mini-golf, offering real-time insights and enhancing gameplay. But how did I create it? It all started with a quest to capture and analyze the game’s raw data.

From Pixels to Insights: Fueling Gemini with Data

My goal with this demo was to use Gemini to deliver insightful and engaging commentary during a sports activity — mini-golf in this case! Gemini, with its multimodal capabilities, can process and understand information from multiple sources, such as video and text.

But you might be wondering: Can’t Gemini understand video directly? You’re absolutely right! Gemini is capable of analyzing video footage. For this project, however, I added a pre-processing step using OpenCV, a popular computer vision library. This allowed me to extract specific data points, like the number of shots taken, to provide even richer context to Gemini, resulting in more colorful and detailed commentary. While Gemini could potentially handle this pre-processing step as well, I wanted to explore combining traditional computer vision approaches with the power of large language models.

In order to test having Gemini create commentary over mini-golf, I first needed some real-world footage. So, I embarked on a mini-golf expedition with a putter, a ball, a portable green, and a GoPro camera mounted on a selfie stick (thanks to a helpful friend!). We set up shop in an indoor basketball court, capturing footage of various putts and swings. Then, using OpenCV’s image tracking algorithms, I tracked the golf ball within each video frame, extracting its x and y coordinates. This allowed me to calculate metrics like the ball’s distance from the hole, whether it was moving or stationary, and — most importantly — the number of shots taken. These extra details would enable Gemini to generate a more detailed and engaging commentary than simply describing the visuals.

This code snippet demonstrates how I use OpenCV’s TrackerCSRT algorithm to track the golf ball throughout the video. By initializing the tracker with the ball’s position in the first frame, we can efficiently locate it in subsequent frames, capturing its movement across the green.

Real-World Refinements

My initial tests with OpenCV highlighted the importance of camera stability for accurate data. Even slight camera movements could impact the calculations, especially for determining the number of shots taken. I found that mounting the GoPro on a stable surface, like a column in my apartment’s dog park (which had artificial turf!), provided much more reliable tracking data.

Continuing with OpenCV, I was able to develop a way to accurately detect the number of shots taken, which I could then feed to Gemini. Think of it like a “movement alarm” — if the ball’s distance changed significantly, it meant a new shot had begun.

This is a visual representation of the concept. Each dot represents the ball’s position in a frame, and the color signifies the shot number. You can see how clusters of dots of the same color represent periods where the ball is stationary, while a change in color and spacing between dots clearly signal the start of a new shot.

Here’s the core logic of my shot detection algorithm:

This check_if_moving function analyzes the ball’s distance from the hole over a brief period (about half a second of footage). By comparing the current distance to the average distance over this window, it can detect when the ball undergoes a significant shift in position, signaling a new shot.

Furthermore, while using a pre-existing tracker like CSRT offered a convenient starting point, it also revealed the limitations. Factors such as the ball being momentarily obscured or moving at high speeds could occasionally lead to the tracker losing its target. If I had more time, I would have opted for a more robust solution, like custom-trained AutoML Vision models, which could handle the nuances based on my own training data.

Storing Insights: Integrating with BigQuery

To make this tracking data accessible for analysis and AI integration, I chose BigQuery, Google Cloud’s serverless data warehouse. BigQuery efficiently stores and retrieves large datasets, making it perfect for handling the constant stream of data generated by analyzing mini-golf videos.

I automate this process using Cloud Functions to create an event-driven pipeline. Whenever a new video was uploaded to Cloud Storage, it automatically triggered a Cloud Function to analyze the video with OpenCV and extract the shot data. This data was then streamed directly into BigQuery, making it immediately available for use with Gemini. This serverless approach ensures a scalable and efficient way to handle the game videos.

Here’s how the tracking data is prepared and inserted into BigQuery:

This code snippet shows how the tracking data from each frame is packaged into a dictionary and then streamed directly into a BigQuery table using the insert_rows_json function. This function enables streaming inserts, allowing data to be written to BigQuery in real-time.

With OpenCV’s image tracking, BigQuery’s storage, and this automated data pipeline, I had a reliable system for analyzing mini-golf games. I could now focus on using Gemini to transform this data into engaging narratives, creating a more compelling user experience.

The Next Chapter: AI Storytelling with Gemini

With the foundation of data collection and processing in place, we can now turn our attention to the heart of “Golf with Gemini” — The AI storyteller, Gemini. In the next blog, we’ll explore how Gemini transforms this data into engaging narratives, creating a more immersive experience. Stay tuned!

If you’re a coding enthusiast or just curious about the behind-the-scenes stuff, feel free to check out the GitHub repository!

--

--