Retrieval Augmented Generation (RAG) with Cloud SQL for MySQL

Published in

Google Cloud - Community

8 min read1 hour ago

How to build a Generative AI application with your Cloud SQL for MySQL database

Background

Cloud SQL for MySQL’s native support for vector embedding storage & search allows you to build generative AI applications with your existing operational MySQL database. One common use for vector embedding similarity search is retrieval augmented generation (RAG) applications, which provide information from your database to a large language model (LLM). Google Cloud Platform provides a sample application & code lab showing a RAG application using Cloud SQL for MySQL. This guide provides a deep-dive into how they implemented the sample app, as well as some background on RAG applications & vector search.

What is vector embedding ANN search?

A vector embedding is a list of numbers that represent a datapoint. Vector embeddings can be generated (with various libraries) to capture many qualities of each datapoint in a single value. This allows computational systems (like a machine learning model) to holistically understand & compare your complex data.

Comparison between vector embeddings is a powerful tool in searching your data, as a shorter vector distance indicates greater similarity. Finding similar vector embeddings via vector distance becomes computationally expensive as the number of datapoints increases. This is where approximate-nearest-neighbor (ANN) search comes into the picture.

ANN search allows you to find similar data points without having to scan your entire dataset. This is achieved with a vector index built using your embeddings. ANN search is popular in a variety of use-cases, from recommending products to customers to powering a search engine.

Cloud SQL for MySQL now provides support for doing all of this within the MySQL engine. You can store your embeddings in a new vector data type and search them with built-in functions for ANN, as well as absolute nearest neighbor search (KNN). For a complete list of Cloud SQL for MySQL’s vector capabilities, check out their public docs.

What is a RAG application?

Here’s a detailed explanation of retrieval-augmented-generation (RAG) and its benefits, but the general idea is that RAG applications allow a large language model (LLM) to use information from your database in its responses. This helps resolve key issues you might face when trying to use an LLM in your application, such as:

The LLM you are using struggles with hallucinations, and you want to ground it with factual, relevant data
You have a large amount of specialized data you want the LLM to use to answer questions that doesn’t fit in a traditional context window
The underlying data that you want to the LLM to use is mutating over time, so a pre-trained model on your data will become stale

When you store vector embeddings of your data, it streamlines the RAG process by allowing the LLM to use vector ANN search to sift through your data for relevant contextual information to the prompt it is given.

Why use Cloud SQL for MySQL with RAG applications?

If you are already using Cloud SQL for MySQL as your operational database, their native support for vector embedding storage and search is a great way to start using generative AI with your application. This is especially easy if the data that you want to provide to an LLM is already stored with MySQL. You can simply add a vector embedding column to the existing tables storing your data to allow it to be easily searched.

Demo

Now let’s put this information to use and build a generative AI application with Cloud SQL for MySQL! This Python-based sample application builds an LLM airline customer service chat bot. It accesses a Cloud SQL for MySQL database storing relevant information (flight times, terms & conditions, etc.) to give its customers accurate responses. We use LangChain to interact with an LLM and the Cloud SQL Python Connector to connect to the database.

Setup

Google Cloud offers a code lab that walks you through creating the resources needed for the application, running the application, and disposing of the resources when you’re done.

Follow steps 1–6 in the code lab to create a client Google Compute Engine (GCE) VM, the necessary service accounts/permissions, and a Cloud SQL for MySQL instance with the demo database created. Take note of the startup flag used when creating the instance which enables vector support, cloudsql_vector=ON.

A closer look at Cloud SQL for MySQL vector setup

In the database setup script you just ran in step # 6 (run_database_init.py), we initialized various tables to store airline information, including vector embedding columns.

To allow our app to easily search through airline terms & conditions, as well as airport amenities we included vector embedding columns in our policies table and amenities table.

Let’s check out the database query used to create the policies table:

CREATE TABLE policies(
  id INT PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(768) USING VARBINARY NOT NULL
);

We use Cloud SQL for MySQL’s new vector data type, supported in >= 8.0.36, to define a column that stores vector embeddings for similarity search. After inserting the data, our script creates a vector index on the embedding column.

CALL mysql.create_vector_index('policies_index', '{self.__db_name}.policies', 'embedding', '');

This vector index enables us to use Cloud SQL for MySQL’s approximate-nearest-neighbor (ANN) search functionality. Performing ANN search on vector embeddings among policies can help us get more meaningful and accurate results than with a typical keyword search. The vector embedding represents not only the keywords in a given policy, but the semantic meaning and complexities.

We also define some queries that we want our LLM to leverage in its responses. This way the model has access to our current, business-specific data. For our policies table, we provide a database query using ANN search:

SELECT content
  FROM policies
  WHERE NEAREST(embedding) TO (string_to_vector(:query),'num_neighbors=10');

The query parameter will contain a vector embedding of the customer’s question and we indicate that we want 10 results with the num_neighbors parameter.

How we process textual data and generate vector embeddings

Since terms & conditions are typically a large document, we source our airline policies from a single text passage. We use LangChain text splitters to chunk the text into pieces that will be useful for our model to work with.

After splitting our policies text into chunks, we generate vector embeddings for each of them using Vertex AI’s text embedding API:

for i in range(0, len(chunked), batch_size):
  request = [x["content"] for x in chunked[i : i + batch_size]]
  response = retry_with_backoff(embed_service.embed_documents, request)
  # Store the retrieved vector embeddings for each chunk back.
  for x, e in zip(chunked[i : i + batch_size], response):
    x["embedding"] = e

The embed_service.embed_documents function calls Vertex AI’s embedding generation. We call this function in batches for efficiency and then associate them with the corresponding text chunk.

Now that we have split our policies into chunks with vector embeddings, we can insert each text chunk/embedding pair as a row in our policies table.

Connecting your database queries to an LLM

LangChain offers an array of tools for working with LLMs in your application, like helping your code interact with models and use powerful technologies like prompt engineering.

LangChain provides Agents that use LLMs to reason through logical steps and take actions before generating a response to a user query. Agents can be provided with tools that specify an action that the agent can take, and the function to call that is equivalent to taking that action. The Agent then chooses from the available tools and determines if they are applicable to the given user query.

In our case, each database query we provided is a tool for the LLM agent. This process is an example of ReACT prompting, as we leverage LLM’s reasoning abilities and provide it with actionable steps.

We give the Agent some formatting instructions for using the tools:

FORMAT_INSTRUCTIONS = """Use a json blob to specify a tool by providing an action key (tool name)
  and an action_input key (tool input).
  Valid "action" values: "Final Answer" or {tool_names}
  Provide only ONE action per $JSON_BLOB, as shown:
  ```
    {{{{
      "action": $TOOL_NAME,
      "action_input": $INPUT
    }}}}
  ```
  Follow this format:
    Question: input question to answer
    Thought: consider previous and subsequent steps
    Action:
      ```
      $JSON_BLOB
      ```
  Observation: action result
  … (repeat Thought/Action/Observation N times)
  Thought: I know what to respond
  Action:
    ```
      {{{{
        "action": "Final Answer",
        "action_input": "Final response to human"
       }}}}
    ```"""

We then package our policies ANN search database query for the LLM as a tool (check out the other tools in this file for our other database queries):

StructuredTool.from_function(
  coroutine=generate_search_policies(client),
  name="Search Policies",
  description="""
    Use this tool to search for cymbal air passenger policy.
    Policy that are listed is unchangeable.
    You will not answer any questions outside of the policy given.
    Policy includes information on ticket purchase and changes,
    baggage, check-in and boarding, special assistance, overbooking, flight delays and cancellations.
    Input of this tool must be in JSON format and include one `query`
    input.
  """,
  args_schema=QueryInput,
),

Notice in the above tool definition how we give the model a clear description of not only what the tool does, but how its output should be interpreted. We pass args_schema=QueryInput so that the LLM knows what arguments are needed for the tool. When the Agent detects that the user is asking a question regarding airline policies, it will select this tool for searching policies in our Cloud SQL for MySQL database, pass the user query as input to the function, and use the output in its response.

The policies search function is then defined as follows:

@routes.get("/policies/search")
async def policies_search(query: str, top_k: int, request: Request):
  ds: datastore.Client = request.app.state.datastore

  embed_service: Embeddings = request.app.state.embed_service
  query_embedding = embed_service.embed_query(query)

  results = await ds.policies_search(query_embedding, 0.5, top_k)
  return results

We embed the user query as a vector embedding, and then perform ANN search with our database query to get similar policies to what the user is looking for.

After getting the results of the policies ANN search database query, the model uses this data as context and formulates a response using it. With this workflow, we have a RAG app!

Deploy the database retrieval service & application

Go ahead and continue with steps 7 & 8 in the code lab, which get the app up and running and walk you through examples.

Sample chat bot conversation demonstrating the sample app’s functionalities (via code lab)

We can see that the LLM is providing a conversational interface that interprets user queries and provides information from the Cloud SQL for MySQL database in its responses.

Wrap-up

If you’ve made it this far, congrats! Step 9 in the code lab will show you how to clean up your resources (make sure to do this if you don’t want to get ongoing charges 🙂). Here are some of the resources that we used and may be helpful for your business needs:

The sample app & code lab we followed
PyMySQL & Cloud SQL Python Connector (Libraries for interacting with your Cloud SQL database from Python–helpful since a lot of LLM APIs are Python-forward)
LangChain
This provides a good explanation of using LangChain tools with agents
There is also a Cloud SQL for MySQL LangChain integration
Cloud SQL for MySQL vector documentation