Building A Simple RAG System With Fastapi (1)

Published in

Thoughts on Machine Learning

4 min readNov 19, 2023

I am bored, it is 23:00pm local time in France. I just decided to build a simple RAG system with fastapi. I write this blogpost at the same time.

First, the design. Retrieval Augmented Generation is a nice way to ground the responses of a LLM and thus reduce hallucinations. It is the basis of the so called chat with X (X being any sort of file, PDF, DOCX, Videos etc). It is the approach I used when I created Discute.

Discute - Chat with your knowledge base

Discute allows you to chat with your knowledge base. By tapping into your knowledge base, discute acts as a virtual…

www.discute.co

Here is the basic design. The user sends a question / request, the request goes through a system that can transform it in a way suitable to the query-able representation of the knowledge source (embeddings, relational DB, knowledge graph etc), information relevant to the user request are then routed to the LLM, and using in-context learning, the LLM crafts a response and sends it back to the user.

There are several ways to query a query-able representation of a knowledge source. If your knowledge source is a bunch of text files for example, you can query it using traditional keyword matching for example, and this approach can yield good…

Building A Simple RAG System With Fastapi (1)

Discute - Chat with your knowledge base

Discute allows you to chat with your knowledge base. By tapping into your knowledge base, discute acts as a virtual…

Written by FS Ndzomga