Member-only story
Featured
Using LanceDB with S3 as your Vector Database
Creating a simple RAG application using LanceDB, Crawl4AI, AWS S3 and LlamaIndex
What is Lance and LanceDB?
Lance is a high-performance, versatile columnar data format optimized for fast querying, training, and data versioning in machine learning applications.
LanceDB can serve as both a vector store and a traditional database for multi-modal data. It is designed to efficiently store and manage vectors for semantic search and retrieval applications while also supporting the storage of actual data alongside embeddings and metadata. LanceDB provides capabilities for managing multi-modal data effectively, making it a versatile solution for various data storage and retrieval needs.
I was personally driven to Lance by my interest in what options were available for running a vector database on a flat file format using S3 for data storage. Lance is not only compatible with AWS S3, but also works with any POSIX file systems, as well as both Azure Blob and Google Cloud Storage.
What are we building?
In this article, we’ll compose a simple RAG application that uses:
- Crawl4AI and OpenAI to intelligently parse and summarize portions of the…