How to Train Proprietary Data using RAG+LLM with my Slack-Bot-Rag

7 min readAug 18, 2023

Hey! I’m Diana. I’m a Solutions Architect Intern at AWS (Toronto). Here is how I created Slack-Bot-Rag, a slack-bot that provides a mechanism that can train proprietary data of customers/enterprises with an on-demand retrieval of data integrating Retrieval Augmented Generation (RAG) with Large Language Models (LLM) on Slack UI.

GenAI Challenges for Large Enterprises

GenAI is booming and we can all feel it. It is no surprise that large enterprises want to incorporate its capabilities with their internal documents, but public LLMs such as ChatGPT come with their own challenges.

Current public LLMs are trained on a large corpus of data, but their data sources and limited and outdated
A lot of the information that enterprises store is also private and scattered around different sources (SharePoints, Confluence, S3 buckets, etc.)
Because LLMs are not trained on domain-specific data, it has a higher possibility of hallucinating when asked about questions outside of the data it was trained
LLMs have limitations around the maximum word count for the input prompt, which can incur high-cost and affect its accuracy

While addressing these technical problems, I also wanted to see if my Slack-Bot-Rag has the capability to solve any business problems along the way.

The first is, how can I speed up the process of our employees finding information?

I’ve had the opportunity to interact with numerous professionals in roles like Solution Architects (SAs), Technical Program Managers (TPMs), Software Development Engineers (SDEs), and various others within this exceptionally dynamic industry throughout my internship. For these experts, relying solely on internet searches might not meet the demands of speed and comprehensiveness required. The established norm involves delving into both internal and external corporate resources, such as the AWS Intranet or AWS Blogs, to conduct thorough research. Nonetheless, sifting through an extensive array of documents can exact a toll on their time, particularly when they have to grasp the content to a degree that enables them to effectively communicate intricate details to clients who often lean on them as dependable sources.

So let’s be smart about this. Instead of going to multiple different places to find information, how can I bring them to one place that everyone can resort to?

Slack is a very common, popular communication tool within AWS/Amazon and other large corporations.
Maybe I can integrate ChatGPT-like capabilities, trained on AWS materials, into the Slack platform instead?

We all know AI can understand, comprehend, and generate an answer that is infinitely faster than humans can. Once we have all the information sourced in one place, the next question is…

How can I query all this information by using natural language?

Lastly, LLMs charge you based on how many tokens it ingests. Instead of loading it with thousands of corporate documents, how can I make the process more cost-friendly?

USING RAG+LLM as Solution

Retrieval Augmented Generation (RAG) is a technique to retrieve data from outside a foundation model to augment the prompts by injecting the relevant retrieved data into the context.

RAG retrieves the most relevant information from the enterprise knowledge base or content based on the user’s request and bundles it as context along with the user’s request as a prompt.
Content retrieval is a critical step in designing effective RAG to ensure that the LLM receives the most relevant and concise context from enterprise content to generate accurate responses.
This context is then sent to the Large Language Model (LLM) to generate a response.

RAG will be able to retrieve the most relevant information from the enterprise knowledge base and bundles it as context along with the user’s request as a prompt. LLM will then read the prompt in order to generate a response.

I used Kendra for Retrieval Augmented Generation because

Kendra automatically does word embeddings, document chunking, and other lower-level complexities typically required for RAG implementations.
Kendra has pre-built connectors to S3, SharePoint, Confluence, websites
Kendra supports unstructured data such as HTML, Word, PowerPoint, PDF, Excel, and pure text files.

Using Langchain for RAG+LLM

I used Langchain, an orchestration tool for prompts, to tie in my LLM (OpenAI) and RAG (Kendra) together.

Chains allow us to combine multiple components together to create a single, coherent application.

Let’s look at what happens when I ask the question “Did I ask about Kendra?” to my Slack-Bot-Rag.

All the green text here is prompt. The prompt reads in the chat history (from DynamoDB), instructs the LLM on how it should generate an answer, and chains Kendra’s related passages.

Slack-Bot-Rag

My Slack-Bot-Rag will have customized information within the constraints of the knowledge contained in the application-specific documents. It will be able to understand, retrieve, and respond to questions related to that specific domain and is more cost-efficient.

DEMO

This demo compares the current ChatGPT model and my Slack-Bot-Rag which has been trained with private, proprietary data. This demo will demonstrate that my Slack-Bot-Rag will be able to understand, retrieve, and answer questions that ChatGPT cannot.

ChatGPT

Slack-Bot-RAG

The Architecture

The user makes a request to Slack Bot
API Gateway will invoke Reader Lambda which will push the question load onto DynamoDB (to store message history) and FIFO Queue (where it will be called by the Writer Lambda)
The Writer Lambda issues a search query to the Amazon Kendra index based on the user request.
The index returns search results with excerpts of relevant documents from the ingested enterprise data.
The Writer Lambda sends the user request along with the data retrieved from the index (from Kendra) and chat history as the context in the LLM prompt
The LLM response will be sent back to Slack

Key Code Walk-Through

Let’s deep-dive into the code: https://github.com/dianachung00/langchain-aws-template/tree/kendra-integ-for-production

Chain.py file explains the detailed mechanics behind how Kendra (RAG) and OpenAI (LLM) interact to generate an answer.

Let’s import LangChain and OpenAI for LLM and insert the prompt.

from langchain.memory import ConversationBufferMemory, DynamoDBChatMessageHistory
from langchain.retrievers import AmazonKendraRetriever
from langchain.chains import ConversationalRetrievalChain  

# <Insert Prompt>
prompt_template = """
    The following is a friendly conversation between a human and an AI. 
    The AI is talkative and provides lots of specific details from its context.
    If the AI does not know the answer to a question, it truthfully says it 
    does not know.
    {context}
    Instruction: Based on the above documents, provide a detailed answer for {question} Answer "don't know" 
    if not present in the document. 
    Solution:"""
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )

    condense_qa_template = """

    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:"""
    standalone_question_prompt = PromptTemplate.from_template(condense_qa_template)First, we need a retriever! I have copied my Kendra IndexID as my retriever

And now we need the retriever! I copied my Kendra Index ID as my retriever (stored in the environments).

The ConversationalRetrievalQA chain builds on RetrievalQAChain to provide a chat history component.

The ConversationalRetrievalQA will combine the user request + chat history, look up relevant documents from the retriever, and finally passes those documents and the question to a question-answering chain to return a response.

#Setting Amazon Kendra as the Retriever (RAG)
retriever = AmazonKendraRetriever(index_id=kendra_index_id)

llm = ChatOpenAI(temperature=0, openai_api_key=api_key)
    conversation = ConversationalRetrievalChain.from_llm(
        llm=llm, 
        retriever=retriever,
        return_source_documents=True, 
        condense_question_prompt=standalone_question_prompt, 
        verbose=True, 
        combine_docs_chain_kwargs={"prompt":PROMPT} 
)

Business Outcome

Reduced Time & Cost

No need to download another 3rd-party app
Easily accessible to enterprise employees through Slack
RAG method (summarized data) → less tokens → cost-effective on LLM

2. Enterprise-Customized Data

Slack-Bot-Rag prevents hallucinated answers
Customized information within the constraints of the knowledge of specific domains

3. A New Industry Example of RAG + LLM

The technology I’ve used for my project are all relatively new (Langchain and Kendra). There’s also very limited working industry examples out there that showcase RAG implementation with LLM at the moment. I hope my project could make some impact to any individuals or enterprises looking for similar solutions.

CONCLUSION & RESOURCES

My Slack-Bot-Rag has been worked on over the course of 12 weeks during my AWS SA Internship. You can view more of my internship experience and final presentation here: https://pitch.com/public/7cc6d755-6949-454e-a5da-9e94b29f11db

You can also clone my repository here: https://github.com/dianachung00/langchain-aws-template/tree/kendra-integ-for-production

Have fun building!

For now, I will be working on improving my chatbot so that it can be

Fully conversational (provide sources of the answers)
Add support for multiple LLM and source documents so that the user can compare results
Modify the CDK with Kendra-integration enabled/downloadable
Add layers of security, monitoring, and user authentication for production-ready