Building A Simple RAG System With Fastapi (1)

FS Ndzomga
Thoughts on Machine Learning
4 min readNov 19, 2023

--

I am bored, it is 23:00pm local time in France. I just decided to build a simple RAG system with fastapi. I write this blogpost at the same time.

First, the design. Retrieval Augmented Generation is a nice way to ground the responses of a LLM and thus reduce hallucinations. It is the basis of the so called chat with X (X being any sort of file, PDF, DOCX, Videos etc). It is the approach I used when I created Discute.

Here is the basic design. The user sends a question / request, the request goes through a system that can transform it in a way suitable to the query-able representation of the knowledge source (embeddings, relational DB, knowledge graph etc), information relevant to the user request are then routed to the LLM, and using in-context learning, the LLM crafts a response and sends it back to the user.

There are several ways to query a query-able representation of a knowledge source. If your knowledge source is a bunch of text files for example, you can query it using traditional keyword matching for example, and this approach can yield good…

--

--

FS Ndzomga
Thoughts on Machine Learning

Engineer passionate about data science, startups, product management, philosophy and French literature. Built lycee.ai, discute.co and rimbaud.ai