Generative Agents with Structured Data

Simon Margolis
Google Cloud - Community
5 min readJul 2, 2024

Businesses often look for generative AI agents that can converse with employees about domain- or role-specific knowledge. While generalist LLMs can help you think about your next vacation or what to make for dinner, businesses need models that are knowledgeable about their specific data and domain.

In the earlier days of generative agents, this was accomplished by performing Retrieval Augmented Generation (RAG) for salient content and leveraging ReAct logic found in libraries like LangChain. This approach was a huge step in the right direction, allowing agents to access information related to the user’s prompt and to reason through the prompt in multiple stages to give the user the best response.

With the benefits associated with this approach also came technical challenges. It was challenging to retrieve the right information and ensure that it was small enough to fit in the model’s context window. Embedding the data on which to retrieve also came with complexities–determining things like chunk sizing and picking the right embedding model made a big difference in the outcome.

To solve this, Google released Vertex Agent Builder (nee Vertex Search & Conversation) to take the guesswork out of embeddings and RAG. Agent Builder also streamlined the process of providing enterprise-specific data to the agent and deploying it for use. In many cases, this was done without any code!

With Vertex Agent Builder, the heavy lifting in building a RAG-based generative application is offloaded to Google. However, there is still work to get a solution that perfectly addresses a given enterprise use case. When it comes to unstructured data such as PDFs or a collection of text files, Agent Builder will provide generative responses and even allow follow-up questions out of the box. Agent Builder even provides the JavaScript code for the application so that it can easily be dropped into your favorite web page. However, a bit more work is required when customers want the ability to interact with their structured data, such as data from a CRM, ERP, SIEM, or other enterprise platform.

The introduction of structured data creates more complexity for a good reason; with the dozens or hundreds of fields in enterprise data platforms, it can be difficult to determine what specific data should be referenced to address a user’s query. Historically, this has been solved by doing some engineering work to provide examples of the types of queries expected from users, custom embedding of the underlying data, and a few other engineering techniques to give a more favorable output. While successful, these approaches mean that developers and users need to agree ahead of time on the specific types of questions that will be asked of this data. Furthermore, developers will need to do a fair amount of engineering work whenever the underlying data or the scope of the queries changes.

To solve this dilemma, we combined two powerful tools under the Agent Builder umbrella to create a simple solution providing the maintainability and broad scope we needed. We started by pointing our CRM data (stored in BigQuery) at the Agent Builder Search tool. This gave us a no-code means of embedding and storing our data for later retrieval. We only had to modify flags on the schema to let the agent know what various fields meant, which were important, and which to display back to users. With this, we had a basic search application allowing natural language queries across our structured data. This only solved some of our problems, however, since the tool returned structured data and not an LLM-generated response.

To give users a chat-based experience for interfacing with this data, we needed to make use of another Agent Builder tool: Generative Agents. With generative agents, we were able to design our chatbot’s behavior, control the user flow, and allow it to use specific data sources. Not only that, but we were able to do all of that without writing any code!

With Generative Agents, the conversation flow is described in English. We simply told the agent what its job was and how to help answer users’ queries. To ensure that the agent provided accurate information, we deployed a simple API on Google Cloud Run which called the Agent Builder Search API, parsed some of the data into a format we liked, and returned the structured data results we wanted in JSON. We then used Gemini 1.5 Pro to write us an OpenAPI YAML spec for the API which provided details on how to call the API, what it needed as an input, and what we could expect it to return.

With this in hand, we were able to simply provide the Gemini-generated YAML to our Generative Agent. We then gave the agent instructions (in plain English!) on how and when to use the tool.

Now we have a Generative Agent that can converse with a user in plain English, understand the intent of the user’s query, retrieve salient information from our CRM data which will help answer the question, and then put all of those pieces together to answer the user in the same conversational style. Better still, the user can ask follow-up questions and interact with the agent just like they would a colleague. And the kicker: a few clicks later, this application was deployed in my Google Workspace organization as a Google Chat bot.

What’s especially exciting about this approach is that it was built in only a few hours and, excluding the Cloud Run API, was built without writing any code — all of the “programming” was done in English!

With these barriers to entry rapidly dropping, we’re excited to see what the future holds for generative AI applications and their developers. What will you build next?!

--

--