Moneyball with GenAI: Using Vertex AI Search to Find the Next Generation of Baseball Stars

Alok Pattani
Google Cloud - Community
7 min readMay 6, 2024
Photo from Unsplash

We’re into the spring months in North America, which means it’s time for longer days, more time outdoors, and…baseball! The Major League Baseball (MLB) season began just over a month ago, and many fans are excited about their teams early in the year.

For some, though, hope for this season no longer “springs eternal” — looking at you, White Sox, Rockies, and Marlins fans — and the focus shifts to what might portend greater success in future seasons. Beyond fans, all 30 MLB teams’ front offices are always looking toward the future in terms of managing their rosters for the rest of this season and beyond.

A big part of MLB teams’ futures are prospects — potential up-and-coming players in the minor leagues, college, or international leagues — that will be the big league stars of tomorrow.

Generative AI and advanced search tools can help baseball personnel and fans find those “diamonds in the rough” with much greater efficiency and scale. How so? Let’s dig into a specific use case of synthesizing information from a large amount of baseball-specific text data using Vertex AI Search and Gemini from Google Cloud.

Baseball Scouting Reports

As part of the popular Moneyball movement over the last couple of decades, MLB teams have invested heavily in using data and advanced baseball statistics to improve evaluation of these prospects. In addition to that, baseball scouts carefully watch prospects and write reports highlighting their strengths and weaknesses, which help teams decide which players to draft or sign.

Scouting reports often have rich information in long-form text, but generally have not been as easy to mine for insights at scale as the more traditional structured baseball data: box scores, pitch-by-pitch results, tracking data, etc. But with the power of search indexing and large language models, it’s now possible to query and search those reports with efficiency similar to more traditional approaches designed for numerical and categorical data.

To replicate what the scouting report infrastructure might look like within an MLB team or scouting service, we created PDFs using scouting reports from MLB.com, which publishes various lists of top prospects and detailed reports on each one. Below is an example report for Paul Skenes, the top pitching prospect who could be playing for the Pirates in the majors very soon.

Example PDF scouting report for Paul Skenes, using data from MLB.com

In total, we have reports for more than 1000 current MLB prospects. Reading through a few of these reports is reasonable, but going through many hundreds — and coming up with meaningful conclusions across them — is quite the challenge.

Creating a Search App in Vertex AI Agent Builder

Vertex AI Search, part of Vertex AI Agent Builder, is a fully managed platform for developers to build Google-quality search experiences for websites, and structured and unstructured data. In this case, we’ll show how to make searching through proprietary scouting reports as easy as finding public information with Google Search.

Some of the key steps to create this Search app are illustrated below; see the documentation for a full step-by-step guide. The main prerequisite is having PDF scouting reports — or whatever text data files you want to search over — in a Cloud Storage bucket.

First, we create a “Search” app in Vertex AI Agent Builder.

Creating a Search App in Vertex AI Agent Builder

We’ll create a “Generic” search app called “mlb-scouting-reports” without Enterprise or Advanced LLM features for now.

Configuration for MLB scouting reports Search app

In the next step, we’ll create a Data Store that points to our Cloud Storage bucket with the scouting report PDFs (tip: make sure to point to the innermost directory with files in it).

Creating a Data Store in Vertex AI Agent Builder

Select that new Data Store in the next step and then create the app. Once data has been processed, we are ready to use our app to search scouting reports.

Search Scouting Reports “Like Google”

Now, we can go to “Apps” under “Agent Builder” and see our “mlb-scouting-reports” app. Clicking on it leads to a Search Preview screen where you can start typing in queries to run over all our scouting reports.

Let’s start with something an MLB fan or front office member alike would love to have: “five-tool players.” These are rare position players who excel in five key aspects of the game: hitting for average, hitting for power, running speed, throwing, and fielding. Within fractions of a second, Vertex AI Search serves up some results:

Vertex AI Search results for “five-tool players”

Each of these players has something in their report that fits the bill! Some of the snippets highlight where in each report it references five tools, and you can also click into the specific PDFs to see the full write-up on each potentially versatile prospect.

Let’s try out another search, a more detailed one focusing on the pitching side: teams looking to shore up their bullpen (which is pretty much… everyone!) might try to find “relievers with significant movement on their fastball”:

Vertex AI Search results for “relievers with significant movement on their fastball”

Verifying these results takes a bit more work to go through the PDFs, since it’s not as much a simple keyword matching exercise, but also shows off the impressiveness of Vertex AI Search. Its high quality natural language understanding allows it to know that scouting talk like “allowing the fastball to play up”, “lot of run”, and “ride and tail” refer to aspects of fastball movement, and then it returns the pitchers who have those types of phrases in their reports.

Gemini Helps “Bring Home” the Results

Now that we’ve shown some examples that demonstrate high search accuracy and relevance, let’s go further and use Gemini to synthesize our results and provide a summary that responds to our queries more directly. This involves making two changes to the configuration of the search app:

1) In the “UI” tab of the “Configurations” menu of our mlb-scouting-reports app, modify “Search type” to “Search with an answer”, change the summarization LLM to Gemini, and (optionally) add instructions to customize the summary:

Modifying Vertex AI Search to add summaries from Gemini to results

2) In the “Advanced” tab of that “Configurations” menu, enable “Enterprise edition” and “Advanced LLM” features.

Turning on Enterprise edition and Advanced LLM features in Vertex AI Search

The features become available after a few minutes, and then we can go back to the “Preview” screen and try a new query, this time looking at some top prospects with potentially concerning injuries:

Generative AI summary from Vertex AI Search results for “top prospects with injury concerns”

With the new configuration, we get an Generative AI-based summary ahead of the search results. It does a great job of summarizing information about five prospects with injuries — all within a few seconds. While a human could do this manually, imagine how long it would take to scour through hundreds of reports to find such players, take the relevant info out of each player’s report, and put it into this succinct form!

If you’re worried about Gemini’s summaries hallucinating — which can happen, though grounding answers in the scouting reports makes this much less frequent — the citations provided have links to click through to the original scouting reports to verify at the source.

Let’s finish with one last query, really channeling our inner general manager and asking directly which third-base prospects we could trade for or sign to help our team’s defense now:

Generative AI summary from Vertex AI Search results for “Which MLB-ready 3B prospects would help us shore up our defense?”

This is a very thorough answer that finds a few third-base prospects with solid defense that reports suggest might be ready to contribute soon. The GM can take this answer and start putting together offers for one of these guys right away!

Of course it’s highly unlikely any baseball decision maker is going to make decisions purely off generative AI, but you can see how Vertex AI Search and Gemini can help a GM (or fan) find in seconds what might otherwise be hours worth of detailed research. And bringing it back to Moneyball, a more efficient process to search through hundreds of prospects increases the chances of finding those undervalued players that might be keys to your’s team success.

Zoom out from the baseball world, and you can see how this use case applies in any industry where there’s rich information that is critical to decision making contained in a large volume of text documents.

Vertex AI Search and Gemini abstract away a lot of the difficulty of building your own RAG architecture: no manual chunking of text, no iterative summarization process, no need for even a topic-specific model. Simply point a Search app at your corpus of PDFs, use Gemini to help summarize results, integrate into your organization, and start streamlining your own enterprise’s decisions!

Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.

See here to learn about how MLB is using Google Cloud to modernize the fan experience.

Special thanks to Jacob Danovitch for inspiration from his “Trouble With the Curve” MLB scouting reports analysis and Andrew Marcum for help conceptualizing this use case.

--

--