Build a 100% Free, Traceable, and Secure RAG Chatbot Using Reranker and GPT-4o

Stan
6 min readJun 1, 2024

Hong T., Jing W. Jing Z. and Luchao J.

This research is partially funded by the OpenAI Research Program. photo is from unsplash.com

Having a completely free, low-latency, hallucination-free chatbot is the holy grail for large language model applications. This open-source RAG Chatbot is one step closer to that goal.

What is a reranker and why do we use it?

A reranker uses a language model to evaluate and sort document chunks, selecting the most relevant documents to answer user questions. It is free (we use an open-source cross-encoder as a reranker) and high-performing.

There are many flavors of rerankers. A cross-encoder is a BERT-based model that evaluates the relevance between a query and documents and selects the most relevant documents to send to an LLM for high-quality answers. It balances speed with accuracy, and reranking methods emphasize better search results. Here, we use a cross-encoder from Hugging Face.

Limitation of Embedding

Embeddings are designed to capture semantic information but often struggle to distinguish nuanced differences between similar phrases, like “I love apples” versus “I used to love apples,” due to a lack of contrastive information. Embeddings are constrained by dimensionality, typically fixed at…

--

--

Stan

A director data scientist working in a tech start-up who is passionate about making a positive impact on people around him