Building A Simple RAG System With Fastapi (1)
I am bored, it is 23:00pm local time in France. I just decided to build a simple RAG system with fastapi. I write this blogpost at the same time.
First, the design. Retrieval Augmented Generation is a nice way to ground the responses of a LLM and thus reduce hallucinations. It is the basis of the so called chat with X (X being any sort of file, PDF, DOCX, Videos etc). It is the approach I used when I created Discute.
Here is the basic design. The user sends a question / request, the request goes through a system that can transform it in a way suitable to the query-able representation of the knowledge source (embeddings, relational DB, knowledge graph etc), information relevant to the user request are then routed to the LLM, and using in-context learning, the LLM crafts a response and sends it back to the user.
There are several ways to query a query-able representation of a knowledge source. If your knowledge source is a bunch of text files for example, you can query it using traditional keyword matching for example, and this approach can yield good…