- Introduction: RAG should solve all our problems?
Recently, I watched “The Inventor: Out for Blood in Silicon Valley.” I liked it so much that I was really surprised: Why were experts unable to identify the low quality of devices for so long?
RAG (Retrieval-Augmented Generation) with LLM is now a similar “hot topic,” but the difference is that I have relatively broad experience in Search (Information Retrieval — part of computer science). So I decided to test most of the RAG Applications available on the market and check how good they are. Or is that just one more blood test fake?
I have seen many articles on testing PDF, HTML, or MS Word document Search experience, but let’s try Table Search.
Most of our important data is in Tables or JSON: price lists, client feedback, sales activity, contractor details, etc.… It is easy to search by ID or some specific date.
But what if you need to do a Logical search in your records, including the text: “Give me my best contractor who can paint the walls in the new office of 4000ft near Denver, CO?” Such routing requests usually take a lot of time, and the results are far from ideal. Let’s try to use RAG providers and see how they can help us.
2. Approach: what kind of tests we will use?
To test new systems, I will use 3 following test data sets, from simple to maximum complexity. Thanks to Kaggle (https://www.kaggle.com/datasets), we have examples of CSV Tables and JSON files from our favorite companies: Disney movies, Coursera courses, and a Y-Combinator list of startups.
Tests description:
- Disney movies (CSV, 0.03MB) — just 600 records, very simple — just name, date and total gross.
- Simple Query to test: “movies with animals as main characters”
- Complex Query to test: “movies with animals as main characters before 01/1990”
2. Coursera (CSV, 2MB) — 10x more records with more details 6000 records: title, rating, description, teachers
- Simple Query to test: “payment improvement in healthcare”
- Complex Query to test: “java with rating above 4.5 and review num > 5k”
3. Y-Combinator Startups (JSON, 6MB) — 10,000 records with many details : name, industry, description
- Simple Query to test: “heart decease”
- Complex Query to test: “for runners founded after 01/2020 with team > 2”
3. Companies that have RAG with LLM
There are many RAG Startups, but most are in very early stages and propose just a Sales Demo or fail to load CSV or JSON.
So these are — Finalists, Best RAG, where you can upload your file and start Intellectual Search :)
- AI Search — https://www.table-search.com/ — because it showed the best results in RAG Search.
- Perplexity — https://www.perplexity.ai/ — because, they say, the company cost > $1 Billion
- Algolia — https://www.algolia.com/ — because they are Advertised everywhere, so I decided to try
- AskCSV — https://askcsv.com/ — because it says exactly what we need and is simple to test
(New!) Gemini 1.5 results separately — “Google Gemini 1.5 Test … Success or Failure?”
4. RAG Final Results
To make the long story short — here are results of 20! Tests
Testing screenshots and user experience below.
Your feedback and comments are more than welcome.
Test #1 : Disney movies search
Search test across Disney movies (CSV, 0.03MB), 600 records
AI Table-Search: https://www.table-search.com
Simple test: Success 1 point: “movies with animals as main characters”
Complex test: Success 3 points: “movies with animals as main characters before 01/1990”
Perplexity — https://www.perplexity.ai
Simple test: Success 1 point: “movies with animals as main characters”
Complex test: Success 3 points: “movies with animals as main characters before 01/1990”
Algolia — https://www.algolia.com/
Simple test: Success 1 point: “movies with animals as main characters”
Complex test: Success 3 points: “movies with animals as main characters before 01/1990”
AskCSV — https://askcsv.com/
Simple test: Failed 0 point: “movies with animals as main characters”
Complex test: Success 3 points: “movies with animals as main characters before 01/1990”
Test #2 : Coursera search
Search test across Coursera (CSV, 2MB), 6,000 records
AI Table-Search: https://www.table-search.com
Simple test: Success 1 point: “payment improvement in healthcare course”
Complex test: Success 3 points: “java with rating above 4.5 and review num > 5k”
Perplexity — https://www.perplexity.ai
Simple test: Success 0.15 point: “payment improvement in healthcare course”
Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”
Algolia — https://www.algolia.com/
Simple test: Success 0 point: “payment improvement in healthcare course”
Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”
AskCSV — https://askcsv.com/
Simple test: Success 0 point: “payment improvement in healthcare course”
Complex test: Success 0 points: “java with rating above 4.5 and review num > 5k”
Test #3 : Y-Combinator IT Startups search
Search test across Y-Combinator Startups (JSON, 6MB) — 10,000 records
AI Table-Search: https://www.table-search.com
Simple test: Success 1 point: “heart decease”
Complex test: Success 2 points: “for sport runners founded after 01/2020 with team > 2”
Perplexity — https://www.perplexity.ai
Simple test: Success 0 point: “heart decease”
Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”
Algolia — https://www.algolia.com/
Simple test: Success 0 point: “heart decease”
Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”
AskCSV — https://askcsv.com/
Simple test: Success 0 point: “heart decease”
Failed because N/A — cannot read JSON format
Complex test: Success 0 points: “for sport runners founded after 01/2020 with team > 2”
Failed because N/A — cannot read JSON format