Reliable Generative AI Multi-Agent Solutions with Gemini, RAG and Grounding in MongoDB Atlas

Kaijun Xu
Google Cloud - Community
14 min readAug 15, 2024

With Vertex AI Agent Apps, we can easily create virtual agents grounded in external and private data repositories like MongoDB Atlas.

In my last post, I wrote about how function calling in Gemini enables us to add domain specific and private knowledge to our model to improve the quality and relevance of our model responses. With the introduction of Vertex AI Agent Apps in preview, we can now combine the power of function calling with virtual agents to create interactive conversational experiences for users without getting bogged down in Dialogflow CX flows, pages, intents, and transitions.

In this post, I will demonstrate how to build a virtual agent grounded in domain specific knowledge (represented by sample data collections in MongoDB Atlas) using Vertex AI Agent Apps. We can then easily extend the agent capabilities with additional tools, such as other private data repositories or public APIs.

The agent app takes user input, provided tools, prepared instructions and examples to generate tool invocations for retrieval-augmented generation.
A logical view of the agent app that we will be building.

What we’ll be doing:

  • Creating a MongoDB Atlas cluster, loading sample data and configuring Atlas App Services.
  • Defining a tool using the OpenAPI specification for the MongoDB Data API, so that our agent can query collections in the Atlas cluster.
  • Creating a specialized agent that will use this tool to search for movies in the sample_mflix.movies collection.
  • Editing the default agent so that it transfers the user to the specialized agent when asked about movies.
  • Testing the app and using our initial interactions with the agent to create examples to make the agent behavior more consistent.

Prerequisites:

  • A MongoDB Atlas account — you can sign up for free here, the free tier M0 cluster is sufficient for this exercise.
  • A Google Cloud account with access to Vertex AI Agent Builder.

Setting up MongoDB:

We will be using the sample data (sample_mflix.movies) provided by MongoDB Atlas to represent the proprietary data that we eventually want our agent to be retrieving — this is a convenient and consistent data set that makes it easy for us to compare our results as we experiment.

For our agent to talk to the data in MongoDB, we will be leveraging the the Data API in Atlas App Services. I chose the us-central1 region, where all the services I need (an M0 cluster, App Services and Vertex AI Agent Apps) are available.

  1. Deploy a M0 cluster and load sample data.
  2. (Optional) For optimal security, allow IP access only from App Services IP addresses.
  3. Create an app using the “Build your own App” option in App Services and note the App ID.
  4. Generate an API key with Project Owner permissions — we will use this with the App Services CLI to configure the App Services application.

Setting up the App Services application:

The App Services CLI allows us to programmatically manage the application, and can easily be installed and used from Cloud Shell.

  1. Install the CLI and authenticate using the API key we generated earlier.
  2. Pull the current app configuration (which should be the default) using appservices pull — remote=< APP ID >.
  3. Create or update the following configuration files, and push the updated configuration to App Services using appservices push — local ./< APP NAME >/ -y.
  4. Create an API key for our agent to authenticate to App Services.

auth/providers.json (enabling API keys as a authentication method for App Services)

{
"api-key": {
"name": "api-key",
"type": "api-key",
"disabled": false
}
}

data_sources/mongodb-atlas/sample_mflix/movies/rules.json (allowing access to the sample collection)

{
"database": "sample_mflix",
"collection": "movies",
"roles": [
{
"name": "readAll",
"apply_when": {},
"document_filters": {
"read": true,
"write": false
},
"insert": false,
"delete": false,
"search": true,
"read": true,
"write": false
}
]
}

https_endpoints/gen_endpoints_config.json (enabling the data API)

{
"versions": [
"v1"
],
"run_as_user_id": "",
"run_as_user_id_script_source": "",
"log_function_arguments": false,
"disabled": false,
"validation_method": "NO_VALIDATION",
"secret_name": "",
"fetch_custom_user_data": false,
"create_user_on_auth": false,
"return_type": "JSON"
}

Creating the Agent App — MongoDB Query Tool

Follow the quick start documentation here up to the “Create the Application” step. If possible, select a region close to where the Atlas cluster and App Services are located.

Since we want our agent to connect to MongoDB, we need to provide the agent with a tool. At the time of writing, we have the choice of defining an external API by providing the OpenAPI schema. Very conveniently, MongoDB has provided the OpenAPI schema for the Data API for download.

Our agent’s tools can be found under the wrench icon on the left of the Agent Console. Click on “Create”, specify the tool name as “movies” and check that the “OpenAPI” type is selected.

The tool list can be accessed from the left menu bar.
Where to find the tools list and create a new tool.

We need to provide a short description of the tool as well as the API schema. The description tells our agent how the tool can be used:

Query MongoDB for movies information by genre, cast or year. Specify the dataSource, collection and database to be queried.

Scroll down to the schema section, ensure that the “JSON” radio button is selected, then paste in the contents of the data API schema.

We will only need the “find documents” path (/action/find), so we can cut down the schema from 1,500+ lines of JSON to something a lot simpler using the in-UI editor (which also allows us to switch between JSON and YAML):

  • Shortening info.description.
  • Provide our data API endpoint (which can be found in App Services > HTTPS Endpoints > Data API) as servers.url.
  • Remove the definition for security entirely (we will configure tool authentication later on).
  • Remove all paths that we will not use, except for /action/find.
  • Remove all components that are not referred to in /action/find, such as securitySchemes, schemas.FindOneRequestBody and schemas.FindOneResponseBody.

From our remaining path, we can further remove:

  • Error responses and all related components, since we will not deal with error handling for now.
  • x-codeSamples and example, as we will be preparing some examples for the agent later.
  • EJSON schemas, because the agent requires the request and response body to be either empty or JSON.

To help our agent along, we can provide a partial schema of the documents in sample_mflix.movies in components.schemas.FindManyResponseBody.properties.documents. items.properties. In particular, we provide an enumerated list of movie genres that helps our agent use the correct search term when querying the collection. For example, “science fiction”, “scifi” and “sci-fi” should all be mapped to “Sci-Fi” in our collection.

This leaves us with a much more tractable tool definition:

{
"openapi": "3.1.0",
"info": {
"version": "v1",
"title": "MongoDB Atlas Data API",
"description": "A fully-managed API to read, write, and aggregate data in MongoDB Atlas."
},
"servers": [
{
"url": "< DATA API ENDPOINT >"
}
],
"paths": {
"/action/find": {
"post": {
"operationId": "find",
"summary": "Find Documents",
"description": "Find multiple documents that match a query.",
"requestBody": {
"content": {
"application/json": {
"schema": {
"allOf": [
{
"$ref": "#/components/schemas/FindManyRequestBody"
}
]
}
}
}
},
"responses": {
"200": {
"description": "Found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/FindManyResponseBody"
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Namespace": {
"type": "object",
"required": [
"dataSource",
"database",
"collection"
],
"properties": {
"dataSource": {
"type": "string",
"description": "The name of a linked MongoDB Atlas data source. This is\ncommonly `\"mongodb-atlas\"` though it may be different in\nyour App if you chose a different name when you created the\ndata source.\n"
},
"database": {
"type": "string",
"description": "The name of a database in the specified data source."
},
"collection": {
"type": "string",
"description": "The name of a collection in the specified database."
}
}
},
"Filter": {
"type": "object",
"properties": {
"filter": {
"type": "object",
"description": "A MongoDB query filter that matches documents. For a list of all query operators that the Data API supports, see [Query Operators](https://www.mongodb.com/docs/atlas/app-services/mongodb/crud-and-aggregation-apis/#query-operators)."
}
}
},
"Projection": {
"type": "object",
"properties": {
"projection": {
"type": "object",
"additionalProperties": {
"type": "number",
"enum": [
0,
1
]
},
"description": "A [MongoDB projection](https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/) for matched documents returned by the operation."
}
}
},
"Sort": {
"type": "object",
"properties": {
"sort": {
"type": "object",
"description": "A [MongoDB sort expression](https://www.mongodb.com/docs/manual/reference/method/cursor.sort/) that indicates sorted field names and directions."
}
}
},
"Limit": {
"type": "object",
"properties": {
"limit": {
"type": "number",
"description": "The maximum number of matching documents to include the in the response."
}
}
},
"Skip": {
"type": "object",
"properties": {
"skip": {
"type": "number",
"description": "The number of matching documents to omit from the response."
}
}
},
"FindManyRequestBody": {
"title": "FindManyRequestBody",
"required": [
"filter"
],
"allOf": [
{
"$ref": "#/components/schemas/Namespace"
},
{
"$ref": "#/components/schemas/Filter"
},
{
"$ref": "#/components/schemas/Projection"
},
{
"$ref": "#/components/schemas/Sort"
},
{
"$ref": "#/components/schemas/Limit"
},
{
"$ref": "#/components/schemas/Skip"
}
]
},
"FindManyResponseBody": {
"title": "FindManyResponseBody",
"type": "object",
"description": "The result of a find operation.",
"properties": {
"documents": {
"type": "array",
"items": {
"type": "object"
},
"properties": {
"_id": {
"type": "string"
},
"title": {
"type": "string",
"description": "Title of the movie."
},
"year": {
"type": "integer",
"description": "Year of release of the movie."
},
"plot": {
"type": "string",
"description": "The plot of the movie."
},
"cast": {
"type": "array",
"description": "A list of cast members appearing in the movie.",
"items": {
"type": "string"
}
},
"genres": {
"type": "array",
"description": "Genre or genres of the movie.",
"items": {
"type": "string",
"enum": [
"Action",
"Comedy",
"Short",
"Adventure",
"Drama",
"Romance",
"Crime",
"Sci-Fi",
"Musical",
"Family",
"War",
"History",
"Film-Noir",
"Mystery",
"Thriller",
"Biography",
"Fantasy",
"Western",
"Animation",
"Horror",
"Sport",
"Music",
"Documentary"
]
}
}
},
"description": "A list of documents that match the specified filter."
}
}
}
}
}
}

Finally, scroll down to the authentication section and set up the tool to authenticate to the data API using an API key by specifying apiKey as the header name and providing the API key that was created in step 4 of “Setting up the App Services application”.

Configure API key authentication for the tool and provide the header name and API key secret.

Click on the “save” button at the top of the screen to save the tool. We can now create our agent that will use this tool.

Creating the Agent App — Specialized Movies Agent

Return to the agent list by clicking on the clipboard icon on the left menu and create a new agent. Since this will be the agent specialized in finding movies from our MongoDB collection, let’s call this agent “Movies”.

The agent list can be accessed from the left menu bar.
Where to find the agent list and create a new agent.

We need to provide both a high-level goal to this agent, and a more detailed set of instructions that the agent is expected to use to accomplish this goal.

Help customers find movies by genre, cast or year.

We make sure to include instructions to handle errors or empty tool results (for example when a query does not return any matching documents from our collection) to mitigate hallucinations.

- Collect the user information on movies that they are interested in and use ${TOOL:movies} to process the query.
- If users have already not provided any search criteria, let the user know that you can search for movies by genre, cast or year.
- Create a search_query and use ${TOOL:movies} to search for movies in the database.
- ALWAYS use ${TOOL:movies} to get results for the user.
- NEVER provide results without using ${TOOL:movies}.
- When you call ${TOOL:movies}, use collection called movies, database called sample_mflix and dataSource called mongodb-atlas.
- When you call ${TOOL:movies}, you can filter by the following fields:
- genres which is a list of movie genres.
- cast which is a list of people starring in the movie.
- year which is an integer representing the year the movie was released. You can use range comparison (greater than or equal $gte, greater than $gt, less than $lt, less than or equal $lte) for this field.
- You can call ${TOOL:movies} by specifying at least one of the filter fields.
- if the API did not return SUCCESS, tell the user that you ran into an error, apologize and ask them to try a different query.
- if the API returns SUCCESS with an empty list, then display "No movies found matching your criteria" to the user.
- if the API returns SUCCESS with a list of documents, present the list to the customer showing only the title, year of release, genres, cast list and plot.
- after you have provided a response to the user, emit the output summary "Movie information has been provided".

The final line of the instructions handles how the specialized agent can return the flow to the default agent. This allows us to add more specialized agents later, for example to connect to different external tools or handle specific tasks.

If you continue scrolling down, you can see that the UI has automatically selected the movies tool from the list of available tools, because our instructions include references to this tool in the form of ${TOOL:movies}.

The Movies agent has automatically selected the movies tool.
The Movies agent has automatically selected the movies tool.

Now that we have our first specialized agent, we can go back to our default agent and set it up to greet our users and direct them to the appropriate specialists.

Creating the Agent App — Default Agent

The default agent is the starting point for conversations. When we created our app, the default agent was created automatically and is marked with a star on the agent list.

The default agent is marked with a star on the list of agents.
Our list of agents at this point — the default agent, and the Movies agent we just created.

Our goal for this agent is relatively simple — greet the user, tell them what our agent can do, and direct them to a specialist agent.

Route users directly to a specialist for assistance to answer questions on movies.

The instructions go into a little more detail, including mitigating hallucinations by remaining within the scope of what our specialist agent can do:

- Greet the customer, tell them you can help with providing information on Movies, ask them how you can help.
- DO NOT attempt to help the user directly.
- ALWAYS transfer them directly to a specialist.
- There is 1 specialist you can choose from:
- To provide information on Movies call ${AGENT:Movies}
- For information not related to the above specialists, respond saying you're unable to determine what the user wants and let them know what you can do.

Unlike our specialist agent earlier, notice that the default agent does not have any of the available tools selected. Save the default agent.

Creating the Agent App — Providing Examples

At this point, our agents are functional — but the quality of responses could vary greatly. We need to provide examples to ensure users receive quality responses.

We could create these examples manually using the Examples tab of each agent, but it is easier to have a conversation with the agent and use that as a base to create our examples. To do this, we use the simulator panel on the right of the screen, ensuring that the default agent is selected.

The simulator panel includes drop downs to select the agent and generative model, and a text box for user input.
The simulator panel with the default agent selected.

We will focus on a few examples illustrating tool use by the Movies agent, but before that, here are a few scenarios where we should create examples for the default agent:

  • Generic greetings e.g. “Hi”, “Hello”, etc.
  • Out-of-scope requests e.g. “I am looking for delicious vegetarian recipes”.
  • Handing off to the Movies agent when appropriate, e.g. “I am looking for a movie”.

For the latter scenario, we may need to manually create the example as shown below:

  • Start with the user input.
  • Add an “Agent invocation” action after the user input and specify the Movies agent. Provide a suitable summary for the invocation input.
  • Specify the invocation output as “Movie information has been provided”, as this is what we instructed the Movies agent to do earlier.
  • Set the agent state to “OK”.
  • Add an “Agent response” action after the agent invocation and specify a suitable prompt, such as “Anything else?”
The example includes user input, an agent invocation action and an agent response as described in the preceding steps.
A manual example for our default agent to transfer the user to the Movies agent.

With examples for these three scenarios, our default agent can now redirect us to the Movies agent correctly. We can see that the Movies agent has taken over the highlighted portion of the conversation and is ready to use our provided tools to query MongoDB.

A conversation example of the default agent covering basic greetings, an out-of-scope request and a transfer to the Movies agent. The highlighted portion of the conversation is now being handled by the Movies agent.
The default agent now hands us to the Movies agent correctly.

Providing Tool Usage Examples

At this point, our agent has no examples of how to query MongoDB. Thankfully, Gemini does a fairly decent job at writing a MongoDB query out of the box, as we can see if we try to search for Jurassic Park (1993):

Searching for “science fiction films starring Richard Attenborough” results in the agent successfully retrieving the 1993 movie, Jurassic Park.
For a simple query, our agent has managed to write a working MongoDB query.

The combination of our tool definition and instructions earlier means that the agent has correctly mapped our input (“science fiction”) to the genre in our database (“Sci-Fi”) and specified the correct database and collection for the data API.

However, there are a couple of points that could be improved — the agent has not used either projection or limit, so we could get a large number of results containing fields that are not useful to us, such as the _id field. We can tweak the tool input and save the example with a more elegant query:

{
"limit": 5,
"database": "sample_mflix",
"projection": {
"genres": 1,
"title": 1,
"plot": 1,
"_id": 0,
"cast": 1,
"year": 1
},
"filter": {
"genres": "Sci-Fi",
"cast": "Richard Attenborough"
},
"dataSource": "mongodb-atlas",
"collection": "movies"
}

Before saving the example, update the summary of the agent execution result and conversation state to “Movie information has been provided” and “OK” respectively.

We can also make other improvements on the example, like removing the repeated greeting from the Movies agent and formatting the response containing the list of movies. After saving the example, we can undo the last two conversation steps and re-run them to see the difference:

After updating our example, the Movies agent no longer gives a repeated greeting to the user, generates a query which retrieves only the fields of interest and provides a neater, more readable response.
The improved flow after tweaking our example.

With this example, the Movies agent no longer greets us again, and the query response and agent response are both more readable. Let’s challenge the agent with a more complex query including a date range:

Continuing the conversation with a more complex query of “movies from between 1995 and 2020 starring Leonardo DiCaprio”. Our agents have retrieved the list of movies correctly, but there is some unintended behavior.
Continuing with a more complex query — some good, some not so good.

Let’s break down what happened:

  • Our default agent understood that we were looking for more movies, and passed along the query to the Movies agent as the input “User is looking for movies from between 1995 and 2020 starring Leonardo DiCaprio
  • The Movies agent generated a query including the correct use of the range terms $gte and $lte (“greater than or equal to” and “less than or equal to” respectively).
{
"collection": "movies",
"projection": {
"title": 1,
"cast": 1,
"year": 1,
"genres": 1,
"_id": 0,
"plot": 1
},
"database": "sample_mflix",
"filter": {
"year": {
"$lte": 2020,
"$gte": 1995
},
"cast": "Leonardo DiCaprio"
},
"dataSource": "mongodb-atlas",
"limit": 5
}
  • The Movies agent presented the list of matching movies in a format consistent with the example created earlier. The list of movies includes Titanic (1997), which incidentally took the title of “highest grossing film of all time worldwide” from Jurassic Park.
  • The Movies agent handed us back to the default agent, which told us it would transfer us to a specialist after we already received the information we wanted.

Our default agent has shown some unintended behavior in the last step, but we can fix this using more examples. In fact, we can derive two more examples from this last conversation segment:

  • Default agent handling of queries which include details — rather than prompting the user again for information they have already provided, the Movies agent can process the query directly and return relevant results. The default agent should then continue the conversation with another prompt to the user.
  • Movies agent processing a query with different filters (cast and year instead of cast and genre) and using date range.

Adding Capabilities with Additional Agents and Tools

Now that we have a single specialist agent working, we can easily expend our agent’s capabilities by adding more specialist agents and tools. For the APAC sessions of MongoDB.local 2024, we prepared an agent which uses the sample_mflix.movies and sample_restaurants.restaurants collections in MongoDB Atlas, along with the Google Maps Places API. You will find a snippet of the conversation with this agent showcasing the two new tools at the end of this post.

In this post, we demonstrated how you can easily leverage your data in MongoDB Atlas for AI-powered applications with the combined features of Vertex AI Agents and MongoDB Atlas. Get started today for free with both Google Cloud and MongoDB Atlas and see what you can build!

A conversation with our expanded agent querying a different collection (sample_restaurants.restaurants) and system (the Google Maps Places API). The agent correctly retrieves a list of Thai restaurants in Queens, New York, information about the Googleplex and tourist attractions near the Googleplex.
Expanding capabilities with two new tools for the sample_restaurants.restaurants collection and Google Maps Places API.

--

--

Kaijun Xu
Google Cloud - Community

Partner Customer Engineer, Data Analytics Specialist at Google Cloud