Experiments with Retrieval Augmented Generation (RAG)
I’ve recently come across the concept of Retrieval Augmented Generation (RAG), and I find it quite intriguing.
With AI becoming more accessible, programmers can now implement AI solutions without delving deeply into specialized knowledge. One limitation, however, is that many AI models, especially pre-trained ones, rely on previously known data and might not be updated with real-time information. This means they might not provide answers that account for the most current data. One way to address this is by augmenting our AI queries with relevant context, ensuring the model has up-to-date data to work with. This requires the integration of a retrieval component that fetches pertinent data before querying an AI service, such as OpenAI.
To efficiently search and retrieve relevant data, we use embeddings. These are essentially vector representations of words that encapsulate their meanings. Once we have these embeddings, they are stored in a database that supports vector operations. Utilizing a database with vector support is crucial as it allows for efficient similarity searches, enabling us to identify and retrieve the most relevant pieces of data. In a way, this is akin to constructing a miniature search engine grounded in vector representations.
I wanted to learn how to implement RAG, so I decided to use PAG-ASA’s weather bulletins as a source of information to experiment with. The idea was to generate embeddings for the weather bulletins, save them to a vector database, and then augment user queries to OpenAI with this data. For the vector database, I opted for LanceDB. LanceDB is a lightweight database that I can easily run on my laptop without a convoluted setup. While there are other vector databases available, I chose LanceDB because it’s analogous to sqlite3 for vector databases.
The first hurdle I encountered was that PAG-ASA doesn’t offer a public API from which to fetch information. This required manually downloading their bulletins, which are in PDF format. Fortunately, PDFs can be easily parsed into text, and I proceeded to do so. After converting the bulletins into a usable format, I integrated the data with OpenAI. My system first determines if a user query requires fetching relevant data from the vector database. If necessary, it retrieves the relevant data from LanceDB to augment the user’s subsequent queries to OpenAI. The results from this experiment have been quite intriguing.
Below is the architecture of the system that I came up with:
Architecture using OpenAI and LanceDB for processing PDF documents.
Here’s a demo of the script in action:
In my demo, I encountered a token limit with OpenAI. While I need to refine my code to make it more robust, this example demonstrates how RAG integrates with OpenAI and LanceDB.