Mastering LangChain RAG: Integrating Chat History (Part 2)

Eric Vaillancourt
17 min readMay 31, 2024

--

All code examples mentioned can be found on our GitHub repository.

Welcome to my in-depth series on LangChain’s RAG (Retrieval-Augmented Generation) technology. Over the course of six articles, we’ll explore how you can leverage RAG to enhance your applications with state-of-the-art natural language processing techniques. Whether you’re a developer, a data scientist, or just an AI enthusiast, this series will equip you with the knowledge to implement and optimize RAG in your projects.

Overview of the Series:

1. Quick Start Guide to LangChain RAG: Jump right in with our first tutorial where we’ll cover the basics of setting up LangChain RAG. This introductory article will help you get your environment ready and run your first RAG-based application.

2. Integrating Chat History: (This artile) Learn how to incorporate chat history into your RAG model to maintain context and improve interaction quality in chat-like conversations. We will also learn how to save chat hostory to an SQL database.

3. Implementing Streaming Capabilities: Discover how to implement streaming with RAG to handle real-time data processing efficiently, perfect for applications requiring immediate responses.

4. Returning Sources with Results: This tutorial will teach you how to configure RAG to provide sources along with responses, adding transparency and credibility to the generated outputs.

5. Adding Citations to Your Results: Enhance your application’s trustworthiness by automatically including citations in your results, making them verifiable and more reliable.

6. Putting It All Together: In our final article, we’ll integrate all the components learned in previous tutorials to build a comprehensive RAG application, demonstrating the power and versatility of this technology.

Integrating Chat History

This article is based on a notebook publish by LangChain.

In addition, I will also include the ability to persist chat messages into an SQL database using SQLAlchemy, ensuring robust and scalable storage of chat history, which was not covered in the original notebook.

Enhancing Dialogue Management in Q&A Applications with Chat History

In many Q&A applications, facilitating a dynamic, back-and-forth conversation between the user and the system is essential. This requires the application to maintain a “memory” of past interactions, allowing it to reference and integrate previous exchanges into its current processing. This article focuses on implementing logic to handle historical messages within a Q&A framework.

Leveraging Past Interactions for Smarter Responses

Our starting point is the Q&A application we developed in the “LLM Powered Autonomous Agents” series by Lilian Weng, featured in our Quickstart guide. To enhance our app’s conversational abilities, we need to make two critical updates:

  1. Updating the Prompt: Our application’s prompt must be modified to include historical messages as input. This change ensures that the system can access prior interactions and use them to understand and respond to new inquiries more effectively.
  2. Contextualizing Questions: We’ll introduce a new component that processes the latest user question by placing it within the context of the chat history. This step is crucial when the user’s query refers back to earlier messages. For instance, if a user asks, “Can you elaborate on the second point?”, the system must refer to the prior context to provide a meaningful response. Without this historical context, effectively retrieving and generating relevant answers would be challenging.

Setup Environment: Ensure your development environment is prepared with the necessary dependencies.

pip install --upgrade --quiet langchain langchain-community langchainhub langchain-openai langchain-chroma bs4 python-dotenv sqlalchemy

We need to set environment variable OPENAI_API_KEY for the embeddings model, which can be done directly or loaded from a .env file like so:

from dotenv import load_dotenv
load_dotenv()

You will have to create a file called “.env”. Here is a sample:

OPENAI_API_KEY = "your-key-here"

Contextualizing Questions with Chat History

To enable our application to handle questions that refer to previous interactions, we first establish a process — referred to as a sub-chain — that utilizes both historical messages and the latest user question. This sub-chain is designed to reformulate a question whenever it references past discussions.

Implementing the Sub-Chain

We incorporate a variable named “chat_history” within our prompt structure, which acts as a placeholder for historical messages. By using the “chat_history” input key, we can seamlessly inject a list of previous messages into the prompt. These messages are strategically positioned after the system’s response and before the latest question posed by the user, ensuring that the context is maintained.

Utilizing Helper Functions for Enhanced Integration

To facilitate this integration, we employ a specialized helper function called create_history_aware_retriever. This function is crucial for managing situations where the chat history might be empty. If history is present, it constructs a sequence that effectively combines the prompt, a large language model (LLM), and a structured output parser (StrOutputParser), followed by a retriever. This sequence ensures that the latest question is contextualized within the accumulated historical data.

The create_history_aware_retriever function is designed to accept keys for both the input and the "chat_history", ensuring compatibility with the output schema of a standard retriever. This approach not only maintains the integrity of the interaction but also enhances the relevance and accuracy of the system’s responses.

Imports:

The code begins by importing necessary classes and functions:

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

create_history_aware_retriever: A function from the langchain.chains library, used to create a retriever that integrates chat history for context-aware processing.

ChatPromptTemplate, MessagesPlaceholder: These are from langchain_core.prompts. ChatPromptTemplate structures prompts for the language model, and MessagesPlaceholder is used to insert historical messages into prompts.

Setting Up the Prompt: The system prompt is set up to instruct the model on how to process the user’s question in context:

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

This prompt tells the model to receive a chat history and a user’s latest question, and then reformulate the question so it can be understood independently of the chat history. The model is explicitly instructed not to answer the question but to reformulate it if necessary.

Creating the Chat Prompt Template:

A prompt template is created to structure the interactions for the model:

contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)

The template includes a system message with the instructions, a placeholder for the chat history (MessagesPlaceholder), and the latest user question marked by {input}. This arrangement ensures that the context is preserved and utilized effectively.

Creating the History-Aware Retriever:

Finally, the history-aware retriever is set up:

history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_q_prompt
)

This line integrates the large language model (llm), a retriever (retriever), and the defined prompt template (contextualize_q_prompt). This configuration enables the system to handle incoming queries by reformulating them appropriately, ensuring that the reformulated questions are understandable on their own, thereby enhancing the relevance and accuracy of the system’s responses.

Chain with chat history

Constructing the Comprehensive Q&A Chain

With the foundational components in place, we’re now ready to construct our comprehensive Q&A chain, a crucial step in enhancing our application’s capability to handle complex inquiries.

Step 1: Building the Question-Answer Chain

We begin by using a specific function to assemble our question_answer_chain. This chain is intricately designed to handle multiple types of input: the context of the conversation, the historical chat data, and the user's current query. It seamlessly integrates these elements to formulate responses that are both accurate and contextually relevant.

Step 2: Assembling the RAG Chain

To further refine our application, we create a RAG chain. This setup effectively combines our history-aware retriever with the previously mentioned question-answer chain. The design ensures that each component processes information in sequence, retaining intermediate outputs such as the retrieved contextual data. This retention is crucial as it ensures that no context is lost between steps, allowing for a more coherent and contextually aware response.

The comprehensive design of this chain allows it to handle inputs and produce outputs that include not just the query and its context, but also a well-integrated response, keeping track of the entire conversation history.

By methodically linking these elements, our Q&A chain is equipped to manage complex questions with nuanced understanding derived from ongoing dialogues. This approach not only improves the relevance of the responses but also enhances the continuity of the conversation, making interactions feel more natural and engaging for users.

This Python code snippet demonstrates the setup of a QA chain using libraries designed to facilitate natural language processing tasks with chain-based approaches. Here’s an explanation along with the embedded code snippets:

Imports:

The code begins by importing necessary modules for creating specialized chains for document handling and retrieval.

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

create_retrieval_chain: This function is used to create a chain that integrates retrieval functionality with a processing function or model.

create_stuff_documents_chain: This function is used to create a chain that combines document handling with other processes, typically involving document retrieval and usage in tasks such as question answering.

Setting Up the QA Prompt:

The prompt template for the QA system is defined, specifying how the system should respond to inputs based on retrieved context.

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", qa_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)

qa_system_prompt: This string sets the instructions for the language model, directing it to use the provided context to answer questions concisely. If the answer is unknown, the model is instructed to state that explicitly.

ChatPromptTemplate.from_messages: This method creates a prompt template for chat interactions, integrating system messages, chat history, and user input, with placeholders allowing dynamic content insertion.

Creating the Question-Answer Chain:

The question_answer_chain is created using the create_stuff_documents_chain function, which utilizes the language model (llm) and the defined prompt (qa_prompt).

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

This chain is responsible for processing the input questions along with the context provided and generating answers based on the setup defined in the prompt template.

Assembling the RAG Chain:

Finally, the rag_chain is assembled using the create_retrieval_chain function, combining the history-aware retriever and the question_answer_chain.

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

This chain represents the complete workflow where the history-aware retriever first processes the query to incorporate any relevant historical context, and then the processed query is handled by the question_answer_chain to produce the final answer.

This setup demonstrates a sophisticated implementation for a QA system capable of leveraging both historical context and specific retrieval techniques to enhance answer accuracy and relevance.

Asking Questions and Follow-up Questions

The following code demonstrates the use of a RAG chain to handle a sequence of questions with the ability to reference previous interactions. The code simulates a chat interaction where a user asks a question, receives an answer, and then asks a follow-up question that can leverage the context of the initial exchange. Here’s a detailed explanation with the included code snippets:

from langchain_core.messages import HumanMessage

chat_history = []

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg_1["answer"]])

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_2["answer"])

Imports: The code imports the HumanMessage class from the langchain_core.messages module. This class is used to create message objects that represent human inputs in the chat history.

from langchain_core.messages import HumanMessage

Setting Up Chat History: The chat history is initialized as an empty list. This list will store messages exchanged during the session to maintain context.

chat_history = []

First Question and Response: A question is defined, and the RAG chain is invoked with this question and the current (empty) chat history.

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

The RAG chain processes the question and returns a response based on its configuration, which may involve retrieving relevant information and generating an answer.

The user’s question and the AI-generated answer are then added to the chat history as HumanMessage instances and response objects, respectively.

chat_history.extend([HumanMessage(content=question), ai_msg_1["answer"]])

Second Question and Response: A follow-up question is asked, leveraging the updated chat history that now contains the context of the first exchange.

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

The RAG chain is invoked again, this time with the second question and the updated chat history, allowing it to consider the previous interactions when generating a response.

Output the Answer: Finally, the answer to the second question is printed out, providing the user with the information they requested.

print(ai_msg_2["answer"])

Result:

Task decomposition can be done in several common ways, including using Language Model (LLM) with simple prompting like "Steps for XYZ" or "What are the subgoals for achieving XYZ?", providing task-specific instructions tailored to the specific task at hand, or incorporating human inputs to guide the decomposition process. These methods help in breaking down complex tasks into smaller, more manageable subtasks for efficient execution.

Tying it together

source : LangChain

So far, we explored how to integrate historical interactions into the application logic. However, we’ve been manually handling the chat history — updating and inserting it for each user interaction. In a robust Q&A application, automating this process is crucial for efficiency and scalability.

To achieve this, two components can be particularly useful:

  • BaseChatMessageHistory: This component is used to store the chat history.
  • RunnableWithMessageHistory: This is a wrapper for an LCEL chain combined with a BaseChatMessageHistory. It automates the process of injecting chat history into inputs and updates it following each invocation.

For those interested in a comprehensive guide on utilizing these components to develop a stateful conversational chain, I recommend visiting the “How to add message history (memory) to LCEL” page.

To illustrate the practical implementation of these concepts, consider the following simplified example where chat histories are managed using a basic dictionary structure:

import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)


### Construct retriever ###
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()


### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_q_prompt
)


### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", qa_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


### Statefully manage chat history ###
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
rag_chain,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
output_messages_key="answer",
)

Now let’s ask the first question:

conversational_rag_chain.invoke(
{"input": "What is Task Decomposition?"},
config={
"configurable": {"session_id": "abc123"}
},
)["answer"]

Result:

'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps agents or models handle difficult tasks by dividing them into more manageable subtasks. It can be achieved through methods like Chain of Thought (CoT) or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.'

Now let’s ask the second question:

conversational_rag_chain.invoke(
{"input": "What are common ways of doing it?"},
config={"configurable": {"session_id": "abc123"}},
)["answer"]

Result:

'Task decomposition can be done in common ways such as using Language Model (LLM) with simple prompting, task-specific instructions, or human inputs. For example, LLM can be guided with prompts like "Steps for XYZ" to break down tasks, or specific instructions like "Write a story outline" can be given for task decomposition. Additionally, human inputs can also be utilized to decompose tasks into smaller, more manageable steps.'

Adding Persistence with SQLAlchemy

To enhance the original code with the capability to persist chat histories in an SQLite database using SQLAlchemy, several key additions and modifications need to be made. Here’s a step-by-step explanation of these changes:

Importing SQLAlchemy and Related Modules

First, import the necessary SQLAlchemy modules to set up the database and ORM (Object Relational Mapping). These modules will allow you to interact with the SQLite database in a Pythonic way.

from sqlalchemy import create_engine, Column, Integer, String, Text, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship, declarative_base
from sqlalchemy.exc import SQLAlchemyError

Defining the Database and ORM Models

Define the SQLite database and the models for storing sessions and messages. The Session class represents a chat session, and the Message class represents individual messages within a session.

# Define the SQLite database
DATABASE_URL = "sqlite:///chat_history.db"
Base = declarative_base()

# Define the Session model
class Session(Base):
__tablename__ = "sessions"
id = Column(Integer, primary_key=True)
session_id = Column(String, unique=True, nullable=False)
messages = relationship("Message", back_populates="session")

# Define the Message model
class Message(Base):
__tablename__ = "messages"
id = Column(Integer, primary_key=True)
session_id = Column(Integer, ForeignKey("sessions.id"), nullable=False)
role = Column(String, nullable=False)
content = Column(Text, nullable=False)
session = relationship("Session", back_populates="messages")

# Create the database and the tables
engine = create_engine(DATABASE_URL)
Base.metadata.create_all(engine)
SessionLocal = sessionmaker(bind=engine)

Utility Function for Database Session Management

Create a utility function to manage database sessions. This function will ensure that each database session is properly opened and closed.

def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()

Saving Messages to the Database

Define a function to save individual messages to the database. This function checks if a session exists; if not, it creates one. Then it saves the message to the corresponding session.

def save_message(session_id: str, role: str, content: str):
db = next(get_db())
try:
# Check if the session already exists
session = db.query(Session).filter(Session.session_id == session_id).first()
if not session:
# Create a new session if it doesn't exist
session = Session(session_id=session_id)
db.add(session)
db.commit()
db.refresh(session)

# Add the message to the session
db.add(Message(session_id=session.id, role=role, content=content))
db.commit()
except SQLAlchemyError:
db.rollback()
finally:
db.close()

Loading Chat History from the Database

Define a function to load chat history from the database. This function retrieves all messages associated with a given session ID and reconstructs the chat history.

def load_session_history(session_id: str) -> BaseChatMessageHistory:
db = next(get_db())
chat_history = ChatMessageHistory()
try:
# Retrieve the session
session = db.query(Session).filter(Session.session_id == session_id).first()
if session:
# Add each message to the chat history
for message in session.messages:
chat_history.add_message({"role": message.role, "content": message.content})
except SQLAlchemyError:
pass
finally:
db.close()

return chat_history

Modifying get_session_history to Use the Database

Update the get_session_history function to retrieve session history from the database instead of only using in-memory storage.

def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = load_session_history(session_id)
return store[session_id]

Ensuring Persistent Storage

Add a function to save all sessions before exiting the application. This function iterates over all sessions in memory and saves their messages to the database.

def save_all_sessions():
for session_id, chat_history in store.items():
for message in chat_history.messages:
save_message(session_id, message["role"], message["content"])

import atexit
atexit.register(save_all_sessions)

Saving User Questions and AI Answers

Modify the chain invocation function to save both the user question and the AI answer. This ensures that every interaction is recorded.

def invoke_and_save(session_id, input_text):
# Save the user question with role "human"
save_message(session_id, "human", input_text)

# Get the AI response
result = conversational_rag_chain.invoke(
{"input": input_text},
config={"configurable": {"session_id": session_id}}
)["answer"]

# Save the AI answer with role "ai"
save_message(session_id, "ai", result)
return result

Final Code Implementation

Here is the final code implementation for integrating historical interactions and ensuring persistence with SQLAlchemy:

import bs4
from sqlalchemy import create_engine, Column, Integer, String, Text, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship, declarative_base
from sqlalchemy.exc import SQLAlchemyError
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Define the SQLite database
DATABASE_URL = "sqlite:///chat_history.db"
Base = declarative_base()

class Session(Base):
__tablename__ = "sessions"
id = Column(Integer, primary_key=True)
session_id = Column(String, unique=True, nullable=False)
messages = relationship("Message", back_populates="session")

class Message(Base):
__tablename__ = "messages"
id = Column(Integer, primary_key=True)
session_id = Column(Integer, ForeignKey("sessions.id"), nullable=False)
role = Column(String, nullable=False)
content = Column(Text, nullable=False)
session = relationship("Session", back_populates="messages")

# Create the database and the tables
engine = create_engine(DATABASE_URL)
Base.metadata.create_all(engine)
SessionLocal = sessionmaker(bind=engine)

def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()

# Function to save a single message
def save_message(session_id: str, role: str, content: str):
db = next(get_db())
try:
session = db.query(Session).filter(Session.session_id == session_id).first()
if not session:
session = Session(session_id=session_id)
db.add(session)
db.commit()
db.refresh(session)

db.add(Message(session_id=session.id, role=role, content=content))
db.commit()
except SQLAlchemyError:
db.rollback()
finally:
db.close()

# Function to load chat history
def load_session_history(session_id: str) -> BaseChatMessageHistory:
db = next(get_db())
chat_history = ChatMessageHistory()
try:
session = db.query(Session).filter(Session.session_id == session_id).first()
if session:
for message in session.messages:
chat_history.add_message({"role": message.role, "content": message.content})
except SQLAlchemyError:
pass
finally:
db.close()

return chat_history

# Modify the get_session_history function to use the database
def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = load_session_history(session_id)
return store[session_id]

# Ensure you save the chat history to the database when needed
def save_all_sessions():
for session_id, chat_history in store.items():
for message in chat_history.messages:
save_message(session_id, message["role"], message["content"])

# Example of saving all sessions before exiting the application
import atexit
atexit.register(save_all_sessions)

### Construct retriever ###
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

### Contextualize question ###
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
llm, retriever, contextualize_q_prompt
)

### Answer question ###
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", qa_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

### Statefully manage chat history ###
store = {}

conversational_rag_chain = RunnableWithMessageHistory(
rag_chain,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
output_messages_key="answer",
)

# Invoke the chain and save the messages after invocation
def invoke_and_save(session_id, input_text):
# Save the user question with role "human"
save_message(session_id, "human", input_text)

result = conversational_rag_chain.invoke(
{"input": input_text},
config={"configurable": {"session_id": session_id}}
)["answer"]

# Save the AI answer with role "ai"
save_message(session_id, "ai", result)
return result

Example Usage

Invoke the chain and save the chat history. This example demonstrates how to use the modified function to interact with the chain and persist the conversation.

result = invoke_and_save("abc123", "What is Task Decomposition?")
print(result)

Conclusion

In this article, we’ve explored how to enhance the functionality of Q&A applications by integrating historical interactions into the application logic, ensuring that each user interaction is both context-aware and precise. We’ve also emphasized the importance of persistence by implementing a system that saves and retrieves chat histories using SQLAlchemy. This ensures that user interactions are not lost between sessions, making the application more robust and reliable.

By automating the management of chat history and ensuring persistent storage, we streamline interactions and significantly boost the application’s ability to engage users in a meaningful dialogue. Persistence not only helps in maintaining continuity in conversations but also provides valuable data for analyzing and improving user interactions over time.

For those looking to delve deeper into the code and perhaps integrate these features into your own projects, all the code snippets and examples discussed in this article will be available on our GitHub repository. This accessibility will allow you to experiment with and tailor the solutions to fit your specific needs.

Thank you for following along, and I look forward to continuing this journey with you in the next parts of our series.

Thank you for reading me!

Support my work by buying me a coffee or two!

--

--

Eric Vaillancourt

Eric Vaillancourt, an AI enthusiast, began his career in 1989, founded a tech consultancy in 1996, and has led over 1,500 trainings in IT and AI.