Quick and Dirty: Building a Private RAG Conversational Agent with LM Studio, Chroma DB, and LangChain

3 min readFeb 9, 2024

In an age where time is of the essence and data privacy is paramount (I work in healthcare), here is my attempt to cut through the technical complexity and try a quick and dirty solution. This ‘Quick and Dirty’ guide is the first in a series dedicated to rapid tech deployment, focusing on the creation of a private conversational agent for private settings. By leveraging LM Studio, Chroma DB, and LangChain, I developed a RAG conversational Chatbot that would work on my local machine behind the firewall of my institution. The code is available in the GitHub repo.

The Inspiration Behind the Project

The inspiration for this project stemmed from a pressing need within hospital settings — the safeguarding of patient information without compromising on the technological advancements that conversational AI can offer. With regulations tightening around data privacy, the challenge was to create a solution that will work behind the firewall on my local machine.

Large Language Models (LLMs) from LM Studio
I am using LM Studio, with Mistral 7B as the current choice. This flexibility allows us to adapt the agent’s capabilities by simply switching out the LLM via the LM Studio GUI.

Vector Database, Embedding, and RAG
For the Retrieval Augmented Generation (RAG) components, Chroma DB (to store embedding), LangChain, and HuggingFace Sentence-Transfomer embedding was my choice. These are easy to use and work well with each other.

Setting Up Environment
The first step is to download and install LM Studio. Next, we need to create a conda environment and install all dependencies.

Here is the code for creating and storing vector database with LangChain and Chroma DB.

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.llms import OpenAI
import openai

loaders = [PyPDFLoader('./pdfs/brain-gliomas-patient.pdf')]

docs = []
for file in loaders:
    docs.extend(file.load())
#split text to chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(docs)
embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})
#print(len(docs))

vectorstore = Chroma.from_documents(docs, embedding_function, persist_directory="./chroma_db_nccn")

print(vectorstore._collection.count())

Here is the code for the RAG-base conversational agent.

from openai import OpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings


# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

embedding_function=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_db = Chroma(persist_directory="./chroma_db_nccn", embedding_function=embedding_function)


history = [
    {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
    {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]

while True:
    completion = client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=0.7,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)
    
    #Uncomment to see chat history
    import json
    gray_color = "\033[90m"
    reset_color = "\033[0m"
    print(f"{gray_color}\n{'-'*20} History dump {'-'*20}\n")
    print(json.dumps(history, indent=2))
    print(f"\n{'-'*55}\n{reset_color}")

    print()
    next_input = input("> ")
    search_results = vector_db.similarity_search(next_input, k=2)
    some_context = ""
    for result in search_results:
        some_context += result.page_content + "\n\n"
    history.append({"role": "user", "content": some_context + next_input})

The development of this private conversational agent marks a significant step forward in balancing the need for advanced technological solutions with the imperative of data privacy in healthcare. As we look to the future, the project opens up avenues for further enhancements, including integrating more advanced LLMs and expanding the database to encompass a broader range of medical knowledge.

I welcome your thoughts, contributions, and suggestions for improving this project. Together, we can continue to refine and expand its capabilities, ensuring that privacy and progress go hand in hand.

Code is available here: https://github.com/grasool/Local-RAG-Chatbot

References

LangChain documentation: https://python.langchain.com/docs/get_started/introduction

2. Chat with PDF using AutoGen|LangChain&FREE local LLM with LMStudio by DataInsightEdge: https://www.youtube.com/watch?v=VtHlFjyp2KI

3. LangChain short courses on DeepLearnng.ai: https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/ and https://www.deeplearning.ai/short-courses/langchain-chat-with-your-data/

4. LM Studio: https://lmstudio.ai/

5. Chroma DB: https://www.trychroma.com/

6. Hugging Face Sentence Transformers: https://huggingface.co/docs/hub/en/sentence-transformers

Quick and Dirty: Building a Private RAG Conversational Agent with LM Studio, Chroma DB, and LangChain

Written by Ghulam Rasool