Adding Context to Retrieval-Augmented Generation with Gemini Function Calling and MongoDB Atlas

Published in

Google Cloud - Community

8 min readMay 28, 2024

Add domain knowledge, optimize cost and improve relevance for vector search.

When working with LLMs, retrieval-augmented generation (RAG) is commonly used to enable the model to answer questions using specific domain knowledge. This often takes the form of data stored in an operational database such as MongoDB Atlas, which provides a single, unified, fully managed platform with integrated vector search capabilities.

In a typical RAG scenario, the user query is converted to an embedding, which is then searched against a vector index to return a set of semantically similar results. Function calling enables us to take this one step further by enabling the model to delegate data processing tasks and maximize the value we get from a unified platform like MongoDB Atlas, including:

Handling user queries in a more conversational and flexible manner.
Optimizing API usage and cost.
Leverage operational data and metadata stored alongside vectors in MongoDB for filtering and enrichment.

We will demonstrate each of these points the Vertex AI Gemini API and embeddings API with MongoDB Atlas, which allows us to build and experiment quickly without having to manage any underlying infrastructure or deployment.

For this demonstration, we use Python and the MongoDB sample_mflix.embedded_movies dataset from Hugging Face. We have replaced the plot_embedding field with an embedding field with embeddings generated using the Vertex AI text-embeddings API.

A simplified query flow with function calling, where Gemini delegates querying operational data on MongoDB to a provided function and uses the function output to complete the answer to a user query. — We will be demonstrating this query flow using function calling with Gemini to answer a user query

More conversational user queries

Without function calling, we can search for movies based on the plot using the embedding field. The user would be expected to provide their desired plot in their query as shown below.

A typical RAG query flow, the user query is embedded using the Vertex AI text embeddings API and the embedding used to query a vector index in MongoDB Atlas. — A typical RAG query flow using MongoDB Atlas and the Vertex AI text embeddings API

In a production system, this approach could be somewhat restricting and inflexible for a user who may be trying to:

Specify query parameters other than the plot — for example their favorite genre (the genres field) or cast member (the cast field).
Provide an unexpected or irrelevant query, such as “delicious vegetarian recipes”.

In the latter case, a naïve vector retrieval using the unexpected user query would return an equally unexpected results as shown below.

An example response if we query our movies database for “delicious vegetarian recipes” — a seemingly random list of movies, only one of which is obviously food-related. — What happens when we perform a vector search for “delicious vegetarian recipes” on our movies collection in MongoDB

Optimizing API usage and cost

In many cases, it would not be useful or necessary to embed the entire user query. For example, users may respond to a chatbot with full sentences such as “I want recommendations for movies about alien visitors, starring Will Smith”. We only need to embed two words (“alien visitors”) from the entire query to match the plot embedding in our vector index.

The user query may not even specify a plot — for example, the request for “movie recommendations starring Will Smith” could be answered with a simple match against the cast field without any embedding. Being able to handle such queries appropriately would not only improve the user experience, but also reduce cost by optimizing our API usage.

Leverage the unified MongoDB Atlas platform

With vector search in MongoDB Atlas, operational data, metadata and vectors are stored in a single place enabling simple, efficient and consistent access to data that can be used to enrich our vector query. For example, we can add a pre-filter to our vector search if the user query specifies a movie genre or cast member.

Some of the other advantages of vector search with MongoDB Atlas include:

No need to sync data between our operational database and a separate vector store.
Vector search output can include rich context from other fields that is always fresh.
Atlas triggers can be used to automatically keep embeddings updated as new documents are inserted or existing ones updated.

Function calling with Gemini

With function calling, we can provide domain knowledge — such as an understanding of our document structure — to enable Gemini to access real-time, updated operational data stored in MongoDB. We start by providing a function declaration that describes our function and providing this function to the model as a tool:

from vertexai.generative_models import (
    FunctionDeclaration,
    GenerationConfig,
    GenerativeModel,
    Tool,
)

# Function declaration for model to query data in MongoDB with user prompt
search_for_movies_func = FunctionDeclaration(
    name='search_for_movies',
    description='Search for movies by plot, actor and genre.',
    parameters={
        'type': 'object',
        'properties': {
            'plot': {
                'type': 'string',
                'description': 'The plot or location of the movie.'
            },
            'actor': {
                'type': 'string',
                'description': ('The name of an actor in the movie,'
                                ' with word capitalization.')
            },
            'genre': {
                'enum': ['Action', 'Comedy', 'Short', 'Adventure', 
                         'Drama', 'Romance', 'Crime', 'Sci-Fi', 
                         'Musical', 'Family', 'War', 'History', 
                         'Film-Noir', 'Mystery', 'Thriller', 'Biography', 
                         'Fantasy', 'Western', 'Animation', 'Horror', 
                         'Sport', 'Music', 'Documentary'],
                'type': 'string',
                'description': 'The genre of the movie.'
            }
        }
    }
)

# Define a tool that includes the above function declaration
search_for_movies_tool = Tool(
    function_declarations=[search_for_movies_func],
)

# Initialize Gemini model and provide it with the tool
model = GenerativeModel(
    model_name='gemini-1.0-pro-001',
    generation_config=GenerationConfig(
        temperature=0,
        max_output_tokens=2048
    ),
    tools=[search_for_movies_tool],
)

When providing a function declaration to the model, there are some best practices to keep in mind. You can see some of these in our function declaration above:

Provide clear and verbose descriptions of the function and parameters.
Use a strongly typed genre parameter based on the genres that exist in our dataset.
Use a low temperature parameter (0) for the model generation configuration.

After invoking the function, we will need to return the function output to the model along with the original user prompt. To facilitate reuse of the user prompt, we can define a Content object to store it:

from vertexai.generative_models import (
    Content,
    Part,
)

user_prompt_content = Content(
    role='user',
    parts=[
        Part.from_text('I want recommendations for movies about alien'
                       ' visitors, starring Will Smith'),
    ],
)

# Send the user prompt to the model
response = model.generate_content(user_prompt_content)

Creating the function

The function declaration above describes our function to the model, but does not actually implement any data processing or query functionality (yet). To create our actual function, we need to understand how the model could propose a function call.

If you are just getting started, Vertex AI Studio is particularly useful as we can simply select the model, generation configuration and provide our function declaration. Note that when providing the function declaration in Vertex AI Studio, the function declaration must be valid JSON.

Using Vertex AI Studio, we can test user prompts for each of the scenarios described previously to see how the model responds when provided with our function declaration:

An irrelevant query:

delicious vegetarian recipes
I am sorry, I cannot fulfill this request. The available tools lack the desired functionality.

A conversational query specifying plot and also cast member:

I want recommendations for movies about alien visitors, starring Will Smith
search_for_movies({“actor”:“Will Smith”,“plot”:“alien visitors”})

A query that does not specify plot, only cast member and genre:

Sigourney Weaver action movies
search_for_movies({“actor”:“Sigourney Weaver”,“genre”:“Action”})

Considering the above responses, we may want our application and function to be able to handle each kind of response:

Provide the model response (response.text) if a function call is not proposed.
Perform a vector search using the MongoDB Atlas vector index if only plot is specified.
Add a pre-filter stage to the vector search if plot and either or both of genre and actor are defined.
Perform a simple match against the collection if plot is not specified.

We can use PyMongo to connect to our MongoDB cluster and perform any required query. For example:

import pymongo
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel

def get_mongo_client(mongo_uri):
    """Establish connection to the MongoDB Atlas cluster."""
    try:
        client = pymongo.MongoClient(mongo_uri)
        print('Connection to MongoDB successful.')
        return client
    except pymongo.errors.ConnectionFailure as e:
        print(f'Connection failed: {e}')
        return None

# Connect to MongoDB
mongo_client = get_mongo_client(mongo_uri)
db = mongo_client['movies']
collection = db['movie_collection']

# Extract the function call arguments from the model proposal
if 'function_call' in response.candidates[0].content.parts[0].to_dict():
    function_call = response.candidates[0].content.parts[0].function_call
movie_query = {}
if 'plot' in function_call.args:
    movie_query['plot'] = function_call.args['plot']
if 'genre' in function_call.args:
    movie_query['genre'] = function_call.args['genre']
if 'actor' in function_call.args:
    movie_query['actor'] = function_call.args['actor']

# Helper functions for search_for_movies
def get_embedding(
    text: str,
    task: str,
) -> list[float]:
    """Get the embedding for the given text and task."""
    if not text.strip():
        print('Attempted to get embedding for empty text.')
        return None
    model = TextEmbeddingModel.from_pretrained(
        'textembedding-gecko@003')
    embeddings = model.get_embeddings(
        [TextEmbeddingInput(text, task)])
    return embeddings[0].values

def mongo_exp(genre, actor):
    """Build the aggregation stage for genre and actor"""
    if genre and actor:
        return {
            '$and': [
                {'cast': actor},
                {'genres': genre},
            ]
        }
    elif genre:
        return {'genres': genre}
    elif actor:
        return {'cast': actor}

def mongo_vector_search(plot, genre='', actor=''):
    """Perform vector search using MongoDB Atlas."""

    # Generate embedding for the user query
    query_embedding = get_embedding(plot, 'RETRIEVAL_QUERY')
    if query_embedding is None:
        print('Invalid or empty query, or embedding generation failed.')
        return None

    # Define the MongoDB vector search pipeline.
    pipeline = [
        {
            '$vectorSearch': {
                'index': 'vector_index',
                'queryVector': query_embedding,
                'path': 'embedding',
                'numCandidates': 150, # Use 150 nearest neighbors
                'limit': 5, # Return top 5 matches
            }
        },
        {
            '$project': {
                '_id': 0,
                'plot': '$fullplot',
                'cast': 1,
                'title': 1,
                'genres': 1,
            }
        },
    ]

    # Add a prefilter if either or both genre and actor are specified
    if genre or actor:
        pipeline[0]['$vectorSearch']['filter'] = mongo_exp(genre, actor)

    # Execute the search and return results as list
    return list (collection.aggregate(pipeline))

def mongo_match(genre='', actor=''):
    """Perform simple match when plot is not specified."""

    # Define the MongoDB match pipeline.
    pipeline = [
        {
            '$match': mongo_exp(genre, actor)
        },
        {
            '$sample': {'size': 5}
        },
        {
            '$project': {
                '_id': 0,
                'plot': '$fullplot',
                'cast': 1,
                'title': 1,
                'genres': 1,
            }
        },
    ]

    # Execute the search and return results as list
    return list (collection.aggregate(pipeline))

def search_for_movies(plot='', genre='', actor=''):
    """Search for movies using provided plot, genre and actor."""
    if not(plot or genre or actor):
        print('No search criteria specified.')
        return None
    elif plot:
        return mongo_vector_search(plot, genre, actor)
    else:
        return mongo_match(genre, actor)

Putting it all together

When the model proposes a function call, we will also need to provide the function output back to the model. In our case, this output would be a list of movies returned by the query against our MongoDB collection.

# List of query results from MongoDB
movie_list = search_for_movies(**movie_query)

# Return the results to Gemini together with the user prompt and 
# function call proposal
response = model.generate_content(
    [
        user_prompt_content,
        response.candidates[0].content,
        Content(
            parts=[
                Part.from_function_response(
                    name='search_for_movies',
                    response={
                        "content": movie_list,
                    },
                ),
            ],
        )
    ]
)

We can now test our various scenarios and see how the model response is relevant to each query. With the provided tool, Gemini can efficiently provide relevant answers to various queries using a combination of vector search and simple matching with MongoDB Atlas.

With function calling, Gemini provides relevant responses across a wider range of queries, including irrelevant queries, or queries that specify parameters besides plot. — Gemini’s response to various queries with function calling to query our MongoDB collection

As a bonus, we can see that when given a query that does not specify any parameters (e.g. a simple request for movie recommendations), Gemini will respond with a request for further details.

Next steps

Get started on Vertex AI Studio with $300 in free credits for new customers, and on MongoDB Atlas for free. You can also read more about the concepts covered above in the respective documentation for function calling in Vertex AI and MongoDB Atlas Vector Search.