Three Open-Source RAG Tools You Need to Know About
Verba, Unstructured, and Nuem
It’s Nov. 2023, and every company wants “chat my data,” and they want it yesterday. But they’re encountering a couple of huge non-starters:
- The majority of their data is sensitive and can’t leave their datacenter
- There are surprisingly few “get all my data into a LLM” enterprise solutions (Microsoft Fabric may soon change that)
Also, consider that you can throw a rock in any direction and hit a “Build a RAG App in 5 minutes with LangChain” article, which goes something like:
pip install langchain
- Enter your OpenAI key here
- Vectorize a single plain text document
- $$$ profit
Take this all together, and you’ve got just about every business rolling their own (i.e. crappy) RAG. And as someone who has spent the last quarter sliding down the Dunning-Krueger curve, I can promise you that it’s taken more than 5 minutes. Here are a few humdrum problems those articles conveniently overlook:
- API for parsing of various doc types (i.e., Powerpoint, HTML, images)
- ETL of dozens of heterogeneous document sources into RAG
- Batch ingestion, versioning…