Part 2 of the “AI Multi-Agents for Everyone Series”

Retrieval Augmented Generation (RAG) In 2 Minutes

What’s the fuzz all about and how to build my custom agent or “chatbot” easy and fast

Anton Antich
Superstring Theory

--

This is Part 2 of the “AI Multi-Agents for Everyone” series. Part 1, an introduction into what AI Multi-Agents are, is available here. Part 3, “anatomy of complex Multi-Agents” is here.

I am convinced we can build what I call “pragmatic AGI” (artificial general intelligence) with today’s level of technology — no need to wait for GPT-5 or magically appearing uber-conscience from more layers in the matrix multiplication architecture of today’s ANNs (artificial neural networks).

“Pragmatic AGI” does not mean it is “smarter” than me or has consciousness (we don’t even know what consciousness is); it simply means it can do various tasks for me at a loosely formulated verbal request. The key to it is AI Multi-Agents: various generative AI models plus the ability to use existing tools and software on the web connected into a single “pipeline”.

Do read the previous article for more details and basic definitions, and today let’s focus on the KEY factor for creating any useful AI multi-agent: RAG, or retrieval augmented generation.

Picture of a typical LLM (large language model) architecture

In essence, RAG is what is depicted on the diagram above — when a user asks your agent (chatbot, LLM, or large language model) something, it has to search for a relevant context first, and then use this context as part of the user’s request. Why? Because otherwise, current LLMs tend to “hallucinate” — invent imaginary answers, which are often incorrect. On the other hand, if you give them the right context, they provide very nice factual responses.

It works as a kind of a human memory — before answering a question, our agent “recalls” what it knows about the subject, and then formulates the answer based on this recollection.

Case in point — comparison of answers to a very simple question “what is ootbi” of the top LLMs on the market — GPT-4 and Claude Opus — with a very simple RAG-enabled multi-agent we created using Integrail Studio in 2 minutes:

Answers to the question “what is ootbi”?

Ootbi is the name of the product of ObjectFirst.com — another company of my current co-founder and serial entrepreneur Ratmir Timashev. As you can see, both of the top models give a likely “correct” answer, but clearly not the one we are looking for. Our RAG-enabled multi-agent on the other hand provides 100% correct and detailed response.

So, how do you enable RAG? There are many ways, with two of the most popular being:

  • Simple textual / web search
  • Conversion of the text into vector representation via so called embeddings and then vector search on top of it

Today, we will look at the first one, being the simplest, and in the next parts will look into building a much more efficient vector search RAG in detail.

Here is how easy it is to create what we call a “poor man’s RAG” multi-agent using Integrail Studio:

Scheme of the “Poor Man’s RAG” multi-agent

All you need to do is to connect three “boxes” following very simple logic:

  • User’s request is being searched for on the given website address, in this case https://objectfirst.com (it can be YOUR website, or your knowledge base, or anything else you like)
  • Resulting page is “read” and converted to markdown format from html (making sure we use much fewer tokens while keeping the quality up)
  • The results are combined in the LLM (large language model) box, using any of the cost-efficient models with significant context size (e.g., Claude of the 2nd generation) — we simply ask the LLM to give the answer to the user’s request using the context we just “read”

That’s it, no need to write any code or come up with expensive infrastructure — you just created your first useful multi-agent!

For it to be useful in most of the real-world scenarios we have to further improve it. For instance, not every user’s request translates well into a search query — so we’ll need to build an additional “converter” for it. Another improvement would be to analyze several found pages, not just one.

We will look at how to create such a production-ready multi-agent next time, for now we encourage you to follow this publication (Superstring Theory) or me (Anton Antich) as an author — for everyone who starts following before public launch of the Integrail Studio in May 2024 will receive $10 worth of token credits — more than enough to run lots of fun experiments of your own.

In the next installments we’ll cover increasingly more complex agents that are not just able to have a conversation with you, but can actually do stuff on your behalf — such as send an email, schedule a meeting, write a summary of the news articles interesting for you, etc etc. The journey couldn’t be more exciting — join us!

--

--

Anton Antich
Superstring Theory

How to scale startups and do AI and functional programming. Building Integrail.ai: pragmatic AGI platform. Built Veeam from 0 to 1B in revenue in under 10 years