Sitemap
Programmer’s Journey

Anything cool related to programming goes here — mostly for Adam’s personal use

Member-only story

Three Open-Source RAG Tools You Need to Know About

9 min readDec 1, 2023

--

The new kids on the block: Verba, Unstructured, and Neum

It’s Nov. 2023, and every company wants “chat my data,” and they want it yesterday. But they’re encountering a couple of huge non-starters:

  1. The majority of their data is sensitive and can’t leave their datacenter
  2. There are surprisingly few “get all my data into a LLM” enterprise solutions (Microsoft Fabric may soon change that)

Also, consider that you can throw a rock in any direction and hit a “Build a RAG App in 5 minutes with LangChain” article, which goes something like:

  1. pip install langchain
  2. Enter your OpenAI key here
  3. Vectorize a single plain text document
  4. $$$ profit

Take this all together, and you’ve got just about every business rolling their own (i.e. crappy) RAG. And as someone who has spent the last quarter sliding down the Dunning-Krueger curve, I can promise you that it’s taken more than 5 minutes. Here are a few humdrum problems those articles conveniently overlook:

  1. API for parsing of various doc types (i.e., Powerpoint, HTML, images)
  2. ETL of dozens of heterogeneous document sources into RAG
  3. Batch ingestion, versioning…

--

--

Programmer’s Journey
Programmer’s Journey

Published in Programmer’s Journey

Anything cool related to programming goes here — mostly for Adam’s personal use

Adam Hughes
Adam Hughes

Written by Adam Hughes

Software Developer, Scientist, Muay Thai, hackDontSlack

Responses (6)