Drop of RAG with LLM: will it make Search better?

IvL
6 min readApr 25, 2024

--

  1. Introduction: RAG should solve all our problems?

Recently, I watched “The Inventor: Out for Blood in Silicon Valley.” I liked it so much that I was really surprised: Why were experts unable to identify the low quality of devices for so long?

RAG (Retrieval-Augmented Generation) with LLM is now a similar “hot topic,” but the difference is that I have relatively broad experience in Search (Information Retrieval — part of computer science). So I decided to test most of the RAG Applications available on the market and check how good they are. Or is that just one more blood test fake?

I have seen many articles on testing PDF, HTML, or MS Word document Search experience, but let’s try Table Search.

Most of our important data is in Tables or JSON: price lists, client feedback, sales activity, contractor details, etc.… It is easy to search by ID or some specific date.

But what if you need to do a Logical search in your records, including the text: “Give me my best contractor who can paint the walls in the new office of 4000ft near Denver, CO?” Such routing requests usually take a lot of time, and the results are far from ideal. Let’s try to use RAG providers and see how they can help us.

2. Approach: what kind of tests we will use?

To test new systems, I will use 3 following test data sets, from simple to maximum complexity. Thanks to Kaggle (https://www.kaggle.com/datasets), we have examples of CSV Tables and JSON files from our favorite companies: Disney movies, Coursera courses, and a Y-Combinator list of startups.

Tests description:

  1. Disney movies (CSV, 0.03MB) — just 600 records, very simple — just name, date and total gross.
  • Simple Query to test: “movies with animals as main characters
  • Complex Query to test: “movies with animals as main characters before 01/1990

2. Coursera (CSV, 2MB) — 10x more records with more details 6000 records: title, rating, description, teachers

  • Simple Query to test: “payment improvement in healthcare
  • Complex Query to test: “java with rating above 4.5 and review num > 5k

3. Y-Combinator Startups (JSON, 6MB) — 10,000 records with many details : name, industry, description

  • Simple Query to test: “heart decease
  • Complex Query to test: “for runners founded after 01/2020 with team > 2

3. Companies that have RAG with LLM

There are many RAG Startups, but most are in very early stages and propose just a Sales Demo or fail to load CSV or JSON.

So these are — Finalists, Best RAG, where you can upload your file and start Intellectual Search :)

  1. AI Search — https://www.table-search.com/ — because it showed the best results in RAG Search.
  2. Perplexityhttps://www.perplexity.ai/ — because, they say, the company cost > $1 Billion
  3. Algoliahttps://www.algolia.com/ — because they are Advertised everywhere, so I decided to try
  4. AskCSVhttps://askcsv.com/ — because it says exactly what we need and is simple to test

(New!) Gemini 1.5 results separately — “Google Gemini 1.5 Test … Success or Failure?

4. RAG Final Results

To make the long story short — here are results of 20! Tests

Testing screenshots and user experience below.

Your feedback and comments are more than welcome.

Test #1 : Disney movies search

Search test across Disney movies (CSV, 0.03MB), 600 records

AI Table-Search: https://www.table-search.com

Simple test: Success 1 point: “movies with animals as main characters

Complex test: Success 3 points: “movies with animals as main characters before 01/1990

Perplexity — https://www.perplexity.ai

Simple test: Success 1 point: “movies with animals as main characters

Complex test: Success 3 points: “movies with animals as main characters before 01/1990

Algolia — https://www.algolia.com/

Simple test: Success 1 point: “movies with animals as main characters

Complex test: Success 3 points: “movies with animals as main characters before 01/1990

AskCSV — https://askcsv.com/

Simple test: Failed 0 point: “movies with animals as main characters

Complex test: Success 3 points: “movies with animals as main characters before 01/1990

Test #2 : Coursera search

Search test across Coursera (CSV, 2MB), 6,000 records

AI Table-Search: https://www.table-search.com

Simple test: Success 1 point: “payment improvement in healthcare course

Complex test: Success 3 points: “java with rating above 4.5 and review num > 5k”

Perplexity — https://www.perplexity.ai

Simple test: Success 0.15 point: “payment improvement in healthcare course

Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”

Algolia — https://www.algolia.com/

Simple test: Success 0 point: “payment improvement in healthcare course

Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”

AskCSV — https://askcsv.com/

Simple test: Success 0 point: “payment improvement in healthcare course

Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”

Test #3 : Y-Combinator IT Startups search

Search test across Y-Combinator Startups (JSON, 6MB) — 10,000 records

AI Table-Search: https://www.table-search.com

Simple test: Success 1 point: “heart decease

Complex test: Success 2 points: “for sport runners founded after 01/2020 with team > 2”

Perplexity — https://www.perplexity.ai

Simple test: Success 0 point: “heart decease

Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”

Algolia — https://www.algolia.com/

Simple test: Success 0 point: “heart decease

Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”

AskCSV — https://askcsv.com/

Simple test: Success 0 point: “heart decease

Failed because N/A — cannot read JSON format

Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”

Failed because N/A — cannot read JSON format

--

--