AI Financial Assistant (Part 2: RAG)

5 min readNov 28, 2023

A couple of weeks ago I created a post about AI Financial Assistant with data streaming using Redpanda and SingleStore DB pipelines.

AI Financial Assistant with data streaming, SingleStore pipelines and advanced RAG (Part1)

TLDR

medium.com

Today, we are going to look into RAG options (Retrieval-Augmented Generation), it’s limitations and alternatives.

In generative AI space things are moving so fast, that when you go to sleep, next morning there are new things, projects and better options to implement something, but let’s take one step at the time and I will help you to understand RAG (Retrieval-Augmented Generation)

What is RAG?

Retrieval-Augmented Generation (RAG) became a popular topic and a buzz word in Gen AI space. For some of you that still need to catch up and understand what it is and how to use it, read on!

In simple terms, RAG is like having an assistant for AI models. Just as a journalist might consult various sources to fact-check an article, RAG allows AI models to fetch and cite relevant facts from external sources, ensuring that the responses they generate are grounded in accurate and up-to-date information. This helps in providing trustworthy and well-informed answers to user queries, making the AI more reliable and reducing the need for constant retraining and updating of models.

For example, think of RAG as a knowledgeable friend who can quickly look up and provide accurate information from reliable sources when you have a question.

For financial analysis we need:

To have:

Financial data management
Conversation engine
Financial coaching

To know:

What happened, when it happened and why? (Reporting and analysis)
What is happening now? (Monitoring)
What is going to happen in the future? (Predictive analysis)

Using RAG, LLMs can pull the latest financial data or news snippets from a constantly updated database, and then generate a comprehensive analysis.

This ensures businesses and analysts receive current insights, allowing them to make informed decisions.

In Part 1, we covered Redpanda streaming and SingleStore DB pipelines

Why we need RAG?

Minimizing Hallucinations — RAG reduces the chances of AI models generating incorrect or misleading information, enhancing the reliability of their responses
Adapting to New Data — RAG enables AI models to stay current and effective without the need for continuous retraining, lowering computational and financial costs
Improving Auditability — RAG allows for the grounding of AI models on verifiable external facts, reducing the risk of leaking sensitive data and enhancing transparency
Providing Added Context — RAG ensures AI models provide accurate and contextually rich responses, improving user experiences and information accuracy

Simple (Naive RAG)

Naive RAG typically splits documents into chunks, embeds them, and retrieves chunks with high semantic similarity to a user question

Step 1. Loading Documents

Step 2. Parsing Documents into Text Chunks (Nodes)

Step 3. Select Embedding Model and LLM

Step 4. Create Index, retriever, and query engine

RAG Filtering

RAG filtering refers to the process of managing and delivering relevant information to users based on their specific needs. It involves the removal of redundant or unwanted information from an information stream, ensuring that users are exposed only to the information that is most relevant to them.

Metadata filtering example with Langchain and SingleStore

How to perform a base similarity search using the SingleStoreVectorStore:

import { SingleStoreVectorStore } from "langchain/vectorstores/singlestore";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

export const run = async () => {
  const vectorStore = await SingleStoreVectorStore.fromTexts(
    ["Hello world", "Bye bye", "hello nice world"],
    [{ id: 2 }, { id: 1 }, { id: 3 }],
    new OpenAIEmbeddings(),
    {
      connectionOptions: {
        host: process.env.SINGLESTORE_HOST,
        port: Number(process.env.SINGLESTORE_PORT),
        user: process.env.SINGLESTORE_USERNAME,
        password: process.env.SINGLESTORE_PASSWORD,
        database: process.env.SINGLESTORE_DATABASE,
      },
    }
  );  const resultOne = await vectorStore.similaritySearch("hello world", 1);
  console.log(resultOne);
  await vectorStore.end();
};

If it is needed to filter results based on specific metadata fields, you can pass a filter parameter to narrow down your search to the documents that match all specified fields in the filter object:

import { SingleStoreVectorStore } from "langchain/vectorstores/singlestore";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

export const run = async () => {
  const vectorStore = await SingleStoreVectorStore.fromTexts(
    ["Good afternoon", "Bye bye", "Boa tarde!", "Até logo!"],
    [
      { id: 1, language: "English" },
      { id: 2, language: "English" },
      { id: 3, language: "Portugese" },
      { id: 4, language: "Portugese" },
    ],
    new OpenAIEmbeddings(),
    {
      connectionOptions: {
        host: process.env.SINGLESTORE_HOST,
        port: Number(process.env.SINGLESTORE_PORT),
        user: process.env.SINGLESTORE_USERNAME,
        password: process.env.SINGLESTORE_PASSWORD,
        database: process.env.SINGLESTORE_DATABASE,
      },
      distanceMetric: "EUCLIDEAN_DISTANCE",
    }
  );  const resultOne = await vectorStore.similaritySearch("greetings", 1, {
    language: "Portugese",
  });
  console.log(resultOne);
  await vectorStore.end();
};

RAG Fusion

RAG Fusion, as proposed by Adrian Raudaschl, aims to enhance the quality of both retrieval and large language model (LLM) summaries by combining the power of retrieval-augmented generation with reciprocal-rank fusion and generated queries.

Wait, what?!

Think of RAG fusion as a bridge between what users explicitly ask and what they intend to ask.
It creates variations of the question, then performs vector search on EACH variation, and then evaluate results and provide most relevant answer

This approach leverages the power of generative AI and vector search to produce direct answers based on trusted data, ultimately aiming to provide richer, more context-aware outputs from large language models

Here’s an overview of how RAG Fusion works:

User Query and Generated Variations: The process begins with a user query, which is then used to generate variations of the user query.
Vector Search Against Each Query: Each generated query variation is used to run a vector search, retrieving relevant information from the knowledge base.
Merge and Rerank Results Using Reciprocal Rank Fusion Algorithm: The results obtained from the vector search are merged and reranked using a reciprocal rank fusion algorithm. This process aims to prioritize and present the most relevant and contextually rich information.
Generate Summary Based on Top Reranked Results: Based on the top reranked results, a summary is generated, providing a comprehensive and context-aware output.

To try RAG fusion

pip -q install langchain huggingface_hub openai tiktoken pypdf
pip -q install google-generativeai chromadb unstructured
pip -q install sentence_transformers
pip -q install -U FlagEmbedding

The rest of the example code could be found here

Original code for RAG fusion is here

CoLLM: RAG for Collaborative Filtering

CoLLM (Collaborative Language Model), is an innovative methodology that combines fine-tuning techniques and retrieval-augmented generation (RAG) concepts to enhance recommendation systems.

It injects collaborative information into large language models (LLMs) without requiring full retraining.

By doing so, CoLLM aims to enable LLMs to exploit collaborative data, leading to better, more personalized recommendations.

This approach seeks to bridge the gap between semantic text information and collaborative data, ultimately enhancing the capabilities of LLM-based recommendation systems.

CoLLM arxiv paper

Self-RAG

Self-RAG is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs.

Github

FastRAG (by Intel Labs)

fastRAG is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models.

Github | Example Notebooks

RAG’s Chain-of-Note (CoN)

Enhancing Robustness in Retrieval-Augmented Language Models

Additional resources for processing 10K financial reports: