Creating Web App For File Interactions Using RAG: A Developers Guide

Published in

Walmart Global Tech Blog

10 min readSep 6, 2024

Image generated using Stable Diffusion 2

Introduction

Large language models (LLMs) are AI systems trained on vast datasets of text and code. They can produce text, translate languages, create diverse content, and provide informative responses to questions.

One of the biggest limitations of large language models (LLMs) is their reliance on outdated training data. Since LLMs are trained on datasets that may not be continuously updated, they lack information about recent events, trends, and advancements. This means they cannot provide accurate or relevant answers about the latest developments in various fields, making their responses less useful for real-time applications.

Additionally, LLMs can sometimes produce “hallucinations” — generating information that is incorrect, nonsensical, or entirely fabricated. This occurs because the models generate responses based on patterns in the data they were trained on rather than having a true understanding of the content. As a result, they might confidently provide wrong information or create plausible-sounding but false statements, which can be misleading and problematic in critical applications.

To mitigate these limitations, retrieval techniques are used. By combining the language understanding capabilities of LLMs with the provision of up-to-date and relevant data, we can reduce hallucinations and improve accuracy.

There are several ways to use retrieval-based techniques. In this article, we'll build a Flask web app using langchain, where users can upload their files. Using the above-described retrieval techniques along with LLMs, users can easily interact with their documents. This type of application can be used in various scenarios requiring Q&A interactions with documents.

Prequisite & Setup

LLM: Basic understanding of large language models. In this article, we will use OpenAI’s model API. You will need to generate an API key from here, but you can also use LLaMA or any other LLM.
Vector Embedings: Vector embeddings in the context of large language models (LLMs) are numerical representations of text data. They capture the semantic meaning of words, phrases, sentences, or entire documents in a continuous vector space. There are various models available to convert a textual document into vectors, in this article we will be using the General Text Embeddings (gte-large) model which generates 1024 dimension vector embeddings.
Vector Database: A vector database stores data as vectors, allowing for fast and accurate similarity searches. In this article, we will use FAISS to store vector embeddings of the documents, although other options like Chroma, Weaviate, Milvus, and Pinecone are available.
LangChain: LangChain is an open-source framework that simplifies creating applications using large language models (LLMs).
Flask: Flask is a web framework that provides libraries for building lightweight web applications in Python.

Let’s start by installing all the required libraries

pip install openai==1.35.14
pip install -U sentence-transformers==2.6.1
pip install langchain==0.1.16
pip install flask==3.0.3

Under the hood

The concept involves providing the LLM with relevant content retrieved from the document, along with the user’s question. This approach ensures that the LLM generates responses based on the specific and accurate information contained in the document, thereby minimizing the risk of hallucinations or fabricating information. This method enhances the accuracy and trustworthiness of the responses, as the LLM relies on verified information rather than generating text solely based on its training data.

When the user uploads documents, we will first read all the files, split them into chunks, and convert these chunks into vector embeddings, which will then be stored in a vector database. When the user asks a question related to the document, we will perform a similarity search to extract only the relevant text. This relevant text will then be provided to the LLM along with the user’s query, which will then generate a response for the user (See below figure).

Chat with documents architecture diagram

Let’s Start building our application

Set up the Flask app structure:

flask_app/
│
├── static/
│   ├── index.css
│   └── chat.css
│   └── chat.js
├── templates/
│   ├── index.html
│   └── chat.html
├── app.py
├── upload2vectorDb.py
├── talk2vectorDb.py

2. Create the home page for our application

templates/index.html:

This HTML page features a form where users can enter a unique username and select multiple files in various formats for upload. The username will be used to identify all files associated with a specific user. This will help us cache information at the user level, allowing users to interact with all their uploaded historical documents.

<!DOCTYPE html>
<html>
<head>
 <title>Upload Files</title>
  <link rel="stylesheet" type="text/css" href="../static/index.css">
</head>
<body>
<div class='App'>
  <form id="upload-form" method="post" enctype=multipart/form-data action="{{ url_for('upload_file') }}">
<div class="form-container">
  <div class="form-group">
  <label for="username"  >Type a unique username::</label>
  <input type="text" id="username" name="username" ><br>
  </div>
  <div class="form-group">
    <label for="file"  >Upload all your files:: </label>
    <input type="file" type="file" name="file" id="file" multiple required><br>
  </div>
    <button type="submit" id="submitButton" >Upload Files</button>
  </div>
  </form>
  
</div>
</body>
</html>

Update static /index.css:

To add some styling to index.html page.

html {
    text-align: center;
    background-color: #1d3469;
    color: white;
  } 
  .App-header {
    background-color: #1d3469;
    min-height: 100vh;
    display: flex;
    flex-direction: column;
    align-items: center;
    justify-content: center;
    font-size: calc(10px + 2vmin);
    color: white;
  }
form {
    display: flex;
    flex-direction: column;
    align-items: flex-start;
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
    background-color: #3d5c95;
    border-radius: 8px;
    box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
    font-size: 20px;
  }
.form-group {
display: flex;
align-items: center;
margin-bottom: 10px;
}
.form-container {
      display: flex;
      flex-direction: column;
      width: 800px; /* adjust the width as needed */
  } 
  #submitButton {
    margin-top: 10px;
    padding: 8px 16px;
    background-color: #898f9c;
    color: white;
    border: none;
    border-radius: 4px;
    cursor: pointer;
  }

  #submitButton:hover {
    background-color:#bdc5d1;
  }
  
  #submitButton:active {
    background-color: #97a1af;
  }

3. Setting up the helper functions

upload2vectorDb.py:

This helper function is designed to read various file types, including csv, doc, html, pdf, py, and ipynb etc. As mentioned earlier, we first read the file and then split it into smaller chunks using LangChain's RecursiveCharacterTextSplitter function. Each chunk is set to a size of 1,000 characters, with an overlap of 200 characters between chunks. These hyperparameters can be adjusted based on the use case and the size of the uploaded documents. Once the document is divided into chunks, we use the gte-largetext embedding model to convert these chunks into vector embeddings of 1024 dimension, which are then saved in FAISS.

import os
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.document_loaders import (
    CSVLoader,
    NotebookLoader,
    PyPDFLoader,
    PythonLoader,
    TextLoader,
    UnstructuredEPubLoader,
    UnstructuredFileLoader,
    UnstructuredHTMLLoader,
    UnstructuredMarkdownLoader,
    UnstructuredODTLoader,
    UnstructuredPowerPointLoader,
    UnstructuredWordDocumentLoader,
)
FILE_LOADER_MAPPING = {
    ".csv": (CSVLoader, {"encoding": "utf-8"}),
    ".doc": (UnstructuredWordDocumentLoader, {}),
    ".docx": (UnstructuredWordDocumentLoader, {}),
    ".epub": (UnstructuredEPubLoader, {}),
    ".html": (UnstructuredHTMLLoader, {}),
    ".md": (UnstructuredMarkdownLoader, {}),
    ".odt": (UnstructuredODTLoader, {}),
    ".pdf": (PyPDFLoader, {}),
    ".ppt": (UnstructuredPowerPointLoader, {}),
    ".pptx": (UnstructuredPowerPointLoader, {}),
    ".txt": (TextLoader, {"encoding": "utf8"}),
    ".ipynb": (NotebookLoader, {}),
    ".py": (PythonLoader, {}),
}
### define page content output formatter
def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)


# Function to convert text to embeddings
def text_to_embedding(text, model):
    embedding = model.encode(text, convert_to_tensor=False)
    return embedding.tolist()


def read_file_and_save_to_faiss(directory_name):
    documents = []

    for file in os.listdir(directory_name):
        file_path = os.path.join(directory_name, file)
        file_type = os.path.splitext(file)[1]
        if file_type == "":
            continue
        if file_type in FILE_LOADER_MAPPING:
            loader_class, loader_args = FILE_LOADER_MAPPING[file_type]
            loader = loader_class(file_path, **loader_args)
        else:
            loader = UnstructuredFileLoader(file_path)
        documents.extend(loader.load())

    print("files reading done")

    ### creating chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, chunk_overlap=200, add_start_index=True
    )
    all_splits = text_splitter.split_documents(documents)
    print("length of splits: ", len(all_splits))

    ### prepare huggingface embedding model
    model_name = "thenlper/gte-large"
    model_kwargs = {"device": "cpu"}
    encode_kwargs = {"normalize_embeddings": False}
    embeddings_huggingFace = HuggingFaceEmbeddings(
        model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
    )

    ### create vectorstore database and convert text into embeddings and save them
    vectorstore = FAISS.from_documents(
        documents=all_splits, embedding=embeddings_huggingFace
    )
    print("vector db created for uploaded documents")

    return vectorstore

talk2vectorDb.py:

In this function, we utilize LangChain to retrieve documents similar to the user’s question. Using these documents, we create a prompt that is then passed to the OpenAI model, and the model’s response is returned to the calling function.

import os
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAI
from langchain.vectorstores import FAISS

### replace this with your open ai key
api_key = "..<your open ai key>"

### initalising the llm 
llm = OpenAI(openai_api_key=api_key)


### define page content output formatter
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


def retrieve_answers_from_faiss(vectorstore, question):

    ### create retriever from the database define its parameters
    retriever = vectorstore.as_retriever(
        search_type="similarity", search_kwargs={"k": 6}
    )

    ### create prompt template
    template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    keep the answer as concise as possible.
    {context}
    Question: {question}
    Helpful Answer:"""

    custom_rag_prompt = PromptTemplate.from_template(template)
    print("prompt created")
    ### create chain of events
    ## retrive ,using question , pass the prompt, call the llm, parse the output
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | custom_rag_prompt
        | llm
        | StrOutputParser()
    )

    ### calling the chain using invoke method
    return rag_chain.invoke(question)

4. Creating chat interface

templates/chat.html:

This file contains HTML and JavaScript code to send and receive messages from the backend server. It also helps maintain and display the history of user messages.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Chat Page</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='chat.css') }}">
</head>
<body>
    <div id="chat-header">
        <h1>Chat with {{ username }}</h1>
    </div>
    <div id="chat-container">
        {% for message in chat_history %}
            {% if message.startswith('You:') %}
                <div class="message user-message">{{ message }}</div>
            {% else %}
                <div class="message app-message">{{ message }}</div>
            {% endif %}
        {% endfor %}
    </div>
    <div id="chat-input-container">
        <form id="chat-form">
            <input type="text" id="chat-input" placeholder="Type a message" required>
            <button type="submit">Send</button>
        </form>
    </div>
</body>
</html>

static/chat.css:

Adding styling elements to give this chat app the look and feel of a standard chatting application.

body {
  font-family: Arial, sans-serif;
  display: flex;
  flex-direction: column;
  height: 100vh;
  margin: 0;
}

#chat-header {
  background-color: #075E54;
  color: white;
  padding: 10px;
  text-align: center;
  font-size: 24px;
  flex-shrink: 0;
}

#chat-container {
  background-color: #E5DDD5;
  padding: 10px;
  flex-grow: 1;
  overflow-y: auto;
  display: flex;
  flex-direction: column;
}

.message {
  border-radius: 8px;
  padding: 10px;
  margin: 5px 0;
  max-width: 70%;
  display: inline-block;
  position: relative;
  word-wrap: break-word;
}

.user-message {
  background-color: #DCF8C6;
  align-self: flex-end;
  text-align: right;
}

.app-message {
  background-color: #FFF;
  align-self: flex-start;
}

.loading-message {
  background-color: #FFC107;
  align-self: center;
  text-align: center;
}

#chat-input-container {
  display: flex;
  align-items: center;
  background-color: #F1F1F1;
  padding: 10px;
  flex-shrink: 0;
}

#chat-form {
  display: flex;
  flex-grow: 1;
}

#chat-input {
  flex-grow: 1;
  border: none;
  padding: 10px;
  border-radius: 20px;
  margin-right: 10px;
  font-size: 16px;
}

#chat-input:focus {
  outline: none;
}

button {
  background-color: #25D366;
  color: white;
  border: none;
  border-radius: 50%;
  padding: 10px;
  font-size: 16px;
  cursor: pointer;
  display: flex;
  align-items: center;
  justify-content: center;
}

button:focus {
  outline: none;
}

#loading-message {
  color: #FF5733;
  padding-left: 10px;
}

static/chat.js:

const chatForm = document.getElementById('chat-form');
const chatInput = document.getElementById('chat-input');
const chatContainer = document.getElementById('chat-container');

chatForm.addEventListener('submit', function(event) {
event.preventDefault();
const message = chatInput.value;
chatInput.value = '';
// Add user's message to chat
const userMessage = document.createElement('div');
userMessage.classList.add('message', 'user-message');
userMessage.textContent = 'You: ' + message;
chatContainer.appendChild(userMessage);
chatContainer.scrollTop = chatContainer.scrollHeight;

// Show loading message
const loadingMessage = document.createElement('div');
loadingMessage.classList.add('message', 'loading-message');
loadingMessage.textContent = 'Loading...';
chatContainer.appendChild(loadingMessage);
chatContainer.scrollTop = chatContainer.scrollHeight;

fetch('{{ url_for('send_message') }}', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/x-www-form-urlencoded',
    },
    body: new URLSearchParams({
        'message': message
    })
}).then(response => response.json()).then(data => {
    chatContainer.removeChild(loadingMessage); // Remove loading message
    data.chat_history.forEach(msg => {
        const newMessage = document.createElement('div');
        newMessage.classList.add('message');
        if (msg.startsWith('You:')) {
            newMessage.classList.add('user-message');
        } else {
            newMessage.classList.add('app-message');
        }
        newMessage.textContent = msg;
        chatContainer.appendChild(newMessage);
    });
    chatContainer.scrollTop = chatContainer.scrollHeight;
});
});

5. Flask Backend

app.py:

This is the main backend function for Flask that integrates all components. It starts with the index.html page, and when the user uploads a document, it redirects them to chat.html, where they can interact with the document.

import os

from flask import Flask, jsonify, redirect, render_template, request, session, url_for
from werkzeug.utils import secure_filename

## contains code to interact with vector DB
import talk2vectorDb

## contains code to upload file in vector DB
import upload2vectorDb

global vectorstore
app = Flask(__name__)

# Set the secret key, which is needed for the session to work
app.secret_key = "..< choose any session key >.."

@app.route("/")
def index():
    return render_template("index.html")

### this function is used to save file upload folder
@app.route("/upload_file", methods=["POST"])
def upload_file():

    username = request.form["username"]
    files = request.files.getlist("file")

    directory_name = "../" + username
    # Set a session variable
    session["username"] = username

    ## checking if folder for user exists or not if not create new folder
    if not os.path.exists(directory_name):
        os.makedirs(directory_name)

    for file in files:
        if file:
            filename = secure_filename(file.filename)
            file.save(os.path.join(directory_name, filename))
            print(file, "is saved in ", directory_name)

    ### calling function to create embedding and saving it to vectordb
    global vectorstore
    vectorstore = upload2vectorDb.read_file_and_save_to_faiss(directory_name)

    return redirect(url_for("chat"))


### this function calls the chat.html page
@app.route("/chat")
def chat():
    username = session.get("username", "Guest")
    chat_history = session.get("chat_history", [])
    return render_template("chat.html", username=username, chat_history=chat_history)


### this function is to provide an iteractive user interface for user to interact with vector DB
@app.route("/send_message", methods=["POST"])
def send_message():
    message = request.form["message"]
    chat_history = session.get("chat_history", [])
    chat_history.append(f"You: {message}")
    print("send message to backend:", message)

    try:
        server_response = talk2vectorDb.retrieve_answers_from_faiss(
            vectorstore, message
        )
    except:
        server_response = "*********file not uploaded properly********"

    chat_history.append(server_response)
    session["chat_history"] = chat_history
    return jsonify({"chat_history": chat_history})


if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=8080)

6. Run the Flask server by executing:

python app.py

After executing this command, open your web browser and go to http://127.0.0.1:8080/ select a unique username, upload your files, and wait for a few seconds; you will see a chatting interface like the one below, where you can interact with your file.

The landing page of our created app where users can select a username and upload their files.

Example output of our created app, when account statement file is uploaded by the user.

Conclusion

This is a simple app that showcases the importance of the RAG pipeline. This article will also help you seamlessly integrate various components, such as LangChain, LLM, Vector DB and Flask. You can develop many such solutions using LLMs. Happy coding!

Written by

Abhijeet Pandey — Data Scientist — Walmart Global Tech
Himanshu Pant — Senior Data Scientist — Walmart Global Tech