Three Open-Source RAG Tools You Need to Know About

Verba, Unstructured, and Nuem

Adam Hughes
Programmer’s Journey

--

The new kids on the block: Verba, Unstructured, and Neum

It’s Nov. 2023, and every company wants “chat my data,” and they want it yesterday. But they’re encountering a couple of huge non-starters:

  1. The majority of their data is sensitive and can’t leave their datacenter
  2. There are surprisingly few “get all my data into a LLM” enterprise solutions (Microsoft Fabric may soon change that)

Also, consider that you can throw a rock in any direction and hit a “Build a RAG App in 5 minutes with LangChain” article, which goes something like:

  1. pip install langchain
  2. Enter your OpenAI key here
  3. Vectorize a single plain text document
  4. $$$ profit

Take this all together, and you’ve got just about every business rolling their own (i.e. crappy) RAG. And as someone who has spent the last quarter sliding down the Dunning-Krueger curve, I can promise you that it’s taken more than 5 minutes. Here are a few humdrum problems those articles conveniently overlook:

  1. API for parsing of various doc types (i.e., Powerpoint, HTML, images)
  2. ETL of dozens of heterogeneous document sources into RAG
  3. Batch ingestion, versioning…

--

--

Adam Hughes
Programmer’s Journey

Software Developer, Scientist, Muay Thai, hackDontSlack