How to Build a Multi-Agent RAG System (MARS) with OpenAI Swarm

Madhukar Kumar
madhukarkumar
Published in
7 min readNov 8, 2024

A few weeks ago, I wrote about Multi-Agent RAG Systems (MARS) and spoke about this proposed architecture at a series of conferences, including TechCrunch Disrupt. During the talk, I shared how MARS could revolutionize enterprise AI by seamlessly blending structured and unstructured data processing. After the presentation, I had people coming up to me, enthused but puzzled, asking one pressing question: “How do I get started?”

This article is an attempt to answer that question by providing a quick refresher and explaining the starter code to help anyone interested in building MARS based enterprise AI applications.

If you are impatient like me, you can jump straight to the code repository and start hacking.

For all others, let’s look at what we are building. Our example is a simple workflow that uses OpenAI’s Swarm library to orchestrate an agent that makes a query into the database (SingleStore) and Nvidia’s Nemo guardrails, a powerful library to do input/output validation and provide query guardrails in an easy-to-use, configurable manner.

Why an agent to query a database, you ask?

Because a lot of enterprises have structured (SQL) and unstructured data and we want to demonstrate that you can run select, group by, aggregates, vector search and exact keyword match in one SQL statement. Given that we are talking about enterprise apps, we cannot rely solely on text-to-SQL generation models because in large enterprises there are massive SQL statement (often several pages long) that LLMs cannot deterministically and accurately generate (yet).

But first, let’s start with the basics.

What is an Agent?

In its simplest form, we can think of an agent as a program capable of making decisions and taking actions based on its environment and inputs. We can deconstruct this notion by imagining an entity, which, similar to humans have access to three key things — 1)Intelligence (through a LLM), 2) Tools (for example, ability to invoke other applications through APIs, browse internet etc.), and optionally 3) Access to specific knowledge (both structured and unstructured data).

To understand this better, let’s double click on each of these key components:

1. Intelligence — LLM with system prompt

An agent often starts with an LLM, typically equipped with a system prompt that provides instructions and a specific persona for how it should operate. The system prompt sets the foundation — whether the agent should be informative, persuasive, or even playful. This allows developers to craft the tone and style of how an agent interacts with users. To make the agent even more specialized, different agents could have access to different fine-tuned LLMs.

2. Tools — Functions and access to APIs

LLMs are powerful, but without connecting them to the real world, they have are just fancy chatbots. Tools provide an agent with functionality that extends its intelligence beyond responding to queries to performing actions, hence the word “agentic.” These tools could include anything from calling APIs, browsing the internet to custom functions for retrieving data from a database (our example). Think of tools as the limbs that allow an agent to interact with the external environment, thus making it more actionable.

3. Knowledge (Optional)

Agents can also be equipped with knowledge — in other words, highly specific system prompts along with specialized data that makes them effective in certain domains. This knowledge can take a few different forms:

a) Unstructured Data

Unstructured data like PDFs, markdown files etc can be used to enrich an agent’s responses. An apt example would be OpenAI’s assistants that can be given access to knowledge from documents, web pages, or PDFs which are used first to answer questions instead of the LLM’s knowledge.

b) Memory/Session to Keep Track of History

Memory allows agents to keep track of user interactions and remember key details over sessions. For example, an agent could recall that a user prefers action movies or remembers details from past questions, making interactions more seamless and personal.

Let’s Build a Simple Agent

To bring all of this into practice, let’s build a simple agent that recommends movies based on user preferences by querying a database. This agent will take advantage of SingleStore, a database that allows you to retrieve SQL data, run analytical queries, and even use vector and keyword search — all in one place.

Why SingleStore?

SingleStore is a natural fit because it combines multiple capabilities into a single SQL query. Instead of piecing together different databases, search engines, and analytics solutions, SingleStore can handle them all — including advanced vector and keyword searches. This makes it ideal for our movie recommendation agent.

Next, for any enterprise application, security and compliance are key concerns. That’s why we also integrate guardrails to provide input and output validation, ensuring a safe user experience. As mentioned above, we will be using Nvidia’s powerful Nemo guardrails for this, as it provides robust input/output validation and is specifically designed to work seamlessly with LLMs, ensuring a high level of reliability and compliance compared to other alternatives.

Implementation Steps

Process flow for the code

We will take the following steps to build our movie recommendation agent:

1. We install Nemo guardrails and drop our policies/rules in a file called ‘rails.co in a folder called nemo-configs (you can name the folder anything you want as long as you load it correctly during guardrails instantiation).
2. We then create a couple of helper functions that uses singlestoredb library to connect to SingleStore and run a query. The example below uses a very simplistic select statement but you can customize to run your own based on your database schema and data. We call this function search_movies.
3. Next, we create an agent and provide it we the name, instructions (system prompt) and finally give it access to our function — search*movies. In this example we have not specifically given a different model for our agent but you can add your own by adding a parameter called model=”your_model_name”
4. We finally put this all together so that when a query comes in we pass that to guardrails first. If the guardrails deem the input to be acceptable (by using our rails.co file) it then either responds but either not taking any further action and responding accordingly for inappropriate queries or calls the SingleStore agent if the query is for a movie recommendation.

That is it. Simple and straightforward.

You can now use this to create other agents and keep adding them to your Swarm object. However, keep in mind that as you scale the number of agents, managing their interactions and ensuring optimal performance can become challenging. It is crucial to monitor the system for latency issues and potential conflicts, and consider strategies like load balancing or modular architecture to address these challenges. In other words, the same principles of software engineering apply to this example as well.

Code for this App

Here is the full implementation of our movie recommendation agent:

python
from swarm import Swarm, Agent
import singlestoredb as s2
import os
from nemoguardrails import LLMRails, RailsConfig
from openai import OpenAI
from typing import Dict, Any, List, Callable, Tuple
import numpy as np

# Initialize OpenAI
client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”))

# Initialize NeMo Guardrails
config = RailsConfig.from_path(“nemo-configs/”)
rails = LLMRails(config)

# Global connection and query
singlestore_conn = None
current_query = “”

def connect_to_singlestore():
“Establish connection to SingleStore”
try:
# Get connection parameters from environment
host = os.getenv(“SINGLESTORE_HOST”)
user = os.getenv(“SINGLESTORE_USER”)
password = os.getenv(“SINGLESTORE_PASSWORD”)
database = os.getenv(“SINGLESTORE_DATABASE”

# Connect using the working format
return s2.connect(host=host,
port=3306,
user=user,
password=password,
database=database)
except Exception as e:
print(f”Error connecting to SingleStore: {e}”)
return None

def search_movies() -> str:
"Core function to search movies in SingleStore"
global singlestore_conn, current_query

try:
if not singlestore_conn:
singlestore_conn = connect_to_singlestore()
if not singlestore_conn:
return “Failed to connect to database”

sql_query = ```
SELECT title, MATCH(title) AGAINST (?) as relevance
FROM movies
WHERE MATCH(title) AGAINST (?)
ORDER BY relevance DESC
LIMIT 10
```

cursor = singlestore_conn.cursor()
cursor.execute(sql_query, (current_query, current_query))

# Convert cursor results to numpy array for easier handling
results = np.array(cursor.fetchall())
cursor.close()

if len(results) == 0:
return “No movie recommendations found for your query.”

response = “Here are some movie recommendations:\n”
for title, relevance in results:
response += f”- {title} (Relevance: {float(relevance):.2f})\n”

return response

except Exception as e:
if singlestore_conn:
singlestore_conn.close()
singlestore_conn = None
return f”Error getting recommendations: {e}”

def direct_llm_response(query: str) -> str:
“””Get response directly from LLM for non-SingleStore queries”””
try:
response = client.chat.completions.create(
model=”gpt-3.5-turbo”,
messages=[
{“role”: “user”, “content”: query}
]
)
return response.choices[0].message.content or “No response generated”
except Exception as e:
return f”Error getting LLM response: {e}”

def is_singlestore_query(response) -> bool:
“””Check if the guardrails response indicates need for SingleStore”””
try:
# Check if response contains specific indicators from our rails.co rules
response_text = response.last_message.content.lower()
return “inform using singlestore” in response_text or \
“delegate to agent” in response_text
except AttributeError:
# If we can’t access the response content as expected,
# default to treating it as a non-SingleStore query
return False

def main():
global current_query

# Initialize Swarm client
swarm_client = Swarm()

# Initialize agent with the movie recommendation function
agent = Agent(
name=”MovieRecommendationAgent”,
instructions=”You are a helpful movie recommendation agent.”,
functions=[search_movies]
)

print(“Welcome! You can ask me anything. Type ‘exit’ to quit.”)

while True:
# Get user input
user_query = input(“\nYou: “).strip()

if user_query.lower() == ‘exit’:
print(“Goodbye!”)
if singlestore_conn:
singlestore_conn.close()
break

try:
# First pass through guardrails
guardrails_response = rails.generate(messages=[{“role”: “user”, “content”: user_query}])

# Check if query needs SingleStore
if is_singlestore_query(guardrails_response):
# Update current query for the search function
current_query = user_query

# Use SingleStore agent for movie recommendations
messages = [{“role”: “user”, “content”: user_query}]
response = swarm_client.run(agent=agent, messages=messages)
print(“Bot:”, response.messages[-1][“content”])
else:
# Use direct LLM response for general queries
response = direct_llm_response(user_query)
print(“Bot:”, response)

except Exception as e:
print(f”An error occurred: {e}”)

if __name__ == “__main__”:
main()

Conclusion

In this article we saw a simple example of how to build a MARS based enterprise AI application by combining OpenAI’s capabilities with SingleStore’s versatile database features and adding a layer of security with NemoGuardrails. My hope is you can use this as a starter to now build out a full-fledged application specific to your company’s requirements.

Happy hacking ✌️

--

--

madhukarkumar
madhukarkumar

Published in madhukarkumar

Musings about growth marketing, Gen AI and random thoughts

Madhukar Kumar
Madhukar Kumar

Written by Madhukar Kumar

CMO @SingleStore, tech buff, ind developer, hacker, distance runner ex @redislabs ex @zuora ex @oracle. My views are my own

No responses yet