How I’ve Optimized Document Interactions with Open WebUI and RAG: A Comprehensive Guide

8 min readMay 5, 2024

In this article, I’ll share how I’ve enhanced my experience using my own private version of ChatGPT to ask about documents. If you don’t have your own personal private ChatGPT, you might find this article I wrote useful.

In a few words, Open WebUI is a versatile and intuitive user interface that acts as a gateway to a personalized private ChatGPT experience. It is rich in resources, offering users the flexibility to choose from a range of language models. With its user-friendly design, Open WebUI allows users to customize their interface according to their preferences, ensuring a unique and private interaction with advanced conversational AI. This platform stands out as a customizable solution for those seeking to harness the power of language models in a way that aligns with their specific needs and desires.

One of my most favored and heavily used features of Open WebUI is the capability to perform queries adding documents or websites (and also YouTube videos) as context to the chat. This is made possible by the RAG Support, and if you are unfamiliar with this concept, I’ll provide a brief explanation below.

Enabling Generative AIs to Consult Documents on the Fly:

Imagine you have a friendly robot that knows a lot of things, sometimes lacks information about recent events that occurred since it was built. For example, if you ask the robot about a new game that everyone is playing, it might not know anything about it because it learned everything it knows before that game was created.

Now, think of the robot having access to a magical library it can consult whenever it needs to answer something unfamiliar. When you ask a question, it goes to the library, retrieves the latest information, and uses it to provide you with an accurate and current response. This process is called Retrieval Augmented Generation, or RAG for short.

RAG is like a superpower for the robot, eliminating the need to make guesses or provide random information, or even hallucinations, when faced with unfamiliar queries. Instead, it can consult the documents in the library and offer truthful and enlightening answers.

So, when you talk with a chat like ChatGPT that features RAG, it can draw upon a broader context, allowing for more precise answers. This is particularly useful when the chatbot is asked to explain complex topics or provide insights based on specific documents.

Now that we know about the RAG concept and understand its role in enabling more accurate responses by providing additional context, let’s explore how this is achieved in Open WebUI.

Uploading and using external file for context scoped chat answer

The first and most straightforward method is to click the + (upload) button located to the left of the message input field. This action allows you to choose files to be used as a context source for the chat. For example, I’ve uploaded a PDF rulebook from the board game Carcassonne and asked the chat how to play it.

The answer I received is quite good; it provides us with a decent overview of the game rules. However, it does not offer enough information to calculate the score points values, for example. This means that we are not fully prepared to start playing Carcassonne. At least I didn’t feel that confident.

This made me consider how I might adjust the Open WebUI settings to enhance the chat’s to deliver more useful answers when querying a document. I’ve found the “Document Settings” on the Documents page and started to explore potential improvements.

Two parameters caught my attention: the Top K value in the Query Params and the RAG Template.

Guide: How I’ve improved the documents related answer on Open WebUI

So let’s start looking into what the Top K value means for the query parameters. Imagine you’re about to draw a picture with colored pencils, and you have a box full with many options, but you’re limited to choosing only 4 pencils. To the chat, the Top K value represents the number of document pieces that will be used to “draw” the answer. This limitation is due to the model’s context size, which is usually limited to a small amount of input data. Instead of sending the entire rulebook to the chat, this clever approach selects the most relevant document pieces based on the question and stitches it with the message to ask the chat.

Open WebUI Documents Settings default values

Immediately I’ve increased the Top K value to 10, allowing the chat to receive more pieces of the rulebook. But, I couldn’t resist the urge to also improve the RAG template, and it seemed only logical to use AI for this. And after some googling I’ve found this nice prompt to improve existing generative AI prompts on the LangSmith Hub and give it a try.

The default RAG Template provided by Open WebUI is as follows:

Use the following context as your learned knowledge, enclosed within <context></context> XML tags.
<context>
[context]
</context>
When answering the user:
- If you don't know the answer, simply state that you don't know.
- If you're unsure, seek clarification.
- Avoid mentioning that the information was sourced from the context.
- Respond in accordance with the language of the user's question.
Given the context information, address the query.
Query: [query]

Using the LangSmith Playground with the prompt, I played around with different models and settings, reaching at this enhanced RAG template:

**Generate Response to User Query**
**Step 1: Parse Context Information**
Extract and utilize relevant knowledge from the provided context within `<context></context>` XML tags.
**Step 2: Analyze User Query**
Carefully read and comprehend the user's query, pinpointing the key concepts, entities, and intent behind the question.
**Step 3: Determine Response**
If the answer to the user's query can be directly inferred from the context information, provide a concise and accurate response in the same language as the user's query.
**Step 4: Handle Uncertainty**
If the answer is not clear, ask the user for clarification to ensure an accurate response.
**Step 5: Avoid Context Attribution**
When formulating your response, do not indicate that the information was derived from the context.
**Step 6: Respond in User's Language**
Maintain consistency by ensuring the response is in the same language as the user's query.
**Step 7: Provide Response**
Generate a clear, concise, and informative response to the user's query, adhering to the guidelines outlined above.
User Query: [query]
<context>
[context]
</context>

I was surprised by the improvement. The new template provided clear steps and detailed instructions, addressing the gaps that made me have the feeling that was missing something in the default version. Now, my Document Settings now look like:

Open WebUI Documents Settings enhaced values

You can just copy and paste the RAG Template on the field and update the Top K value for what you want, I recommend start trying with 10 and adjusting on you use experience. And now its time to test again the chat answer with the same question about how to play Carcassonne:

The new response was significantly better, including more detailed instructions on gameplay, scoring (with point values), and basic rules. While there are certainly more rules to learn for Carcassonne, this improvement undeniably demonstrates a substantial enhancement in document querying.

How this feature can transform your day a day

The applications for this feature extend far beyond learning how to play Carcassonne. For example:

Students can summarize lengthy textbooks to focus on key concepts.
Researchers can quickly extract relevant data from scientific papers.
Legal professionals can analyze case law and statutes for pertinent information.
Business analysts can distill insights from market research reports.
Writers and journalists can fact-check articles and gather background information.

For students and professionals alike, this feature can serve as a significant productivity booster, providing factual answers from large documents, summarizing articles, and more. Of course, it’s always important to critically evaluate the information provided by large language models (LLMs), as they also can make mistakes ¯\_(ツ)_/¯.

This functionality is not restricted to documents; Open WebUI also enables direct queries about web pages by starting a message with # followed by the URL, like #www.something.com. This feature also works with YouTube videos that include subtitles in English. For this article purpose I’ll reference this Wikipedia article and ask about it.

I’ve tested this with an Wikipedia Article about Madonna, asking about her largest concert ever. With the web page as context, the chat provided an up-to-date answer, informing me that Madonna had just performed her largest concert of all time at Copacabana Beach in Rio de Janeiro on May 4, 2024, yesterday. This example opens up unlimited opportunities to boost your day-to-day productivity by using Generative AI with RAG Support.

In conclusion, the enhancements I’ve made to the RAG settings of my Open WebUI instance have for sure improved the quality of the chat answers. By tweaking the Top K value and refining the RAG template with other LLM models, I’ve been able to increase the potential of document and web-based queries. This not only enriches my experience but also opens up a lot of possibilities for leveraging AI in many professional and personal contexts. As we continue to push the boundaries of what AI can achieve, it’s clear that the integration of RAG into conversational interfaces like Open WebUI represents a significant leap forward in the realm of intelligent assistants.

Hope you enjoy this article as much as I’ve did writing it.

How I’ve Optimized Document Interactions with Open WebUI and RAG: A Comprehensive Guide

Enabling Generative AIs to Consult Documents on the Fly:

Uploading and using external file for context scoped chat answer

Guide: How I’ve improved the documents related answer on Open WebUI

How this feature can transform your day a day

Written by Kelvin Campelo