Conversational Memory with LangChain for LLMs

11 min readJun 9, 2024

Introduction

Memory enables a Large Language Model (LLM) to recall previous interactions with the user. By default, LLMs are stateless, meaning each query is processed independently of other interactions. For a stateless agent, only the current input exists.
Remembering previous interactions is crucial for many applications, such as chatbots. Conversational memory allows us to achieve this.
There are two key parameters: {history} and {input}. The {input} parameter contains the latest user query, while the {history} parameter utilises conversational memory to include information about past interactions between the user and the AI. These parameters are integrated into the LLM through the prompt template.

Types of Conversational Memory

Several types of conversational memory can be used with the ConversationChain. These methods format and modify the history passed to the {history} parameter.

In this article, we will cover:

ConversationBufferMemory
ConversationSummaryMemory
ConversationBufferWindowMemory
ConversationSummaryBufferMemory
ConversationKnowledgeGraphMemory

1. ConversationBufferMemory

The ConversationBufferMemory is the simplest form of conversational memory in LangChain. It passes the raw input of past interactions between the human and AI directly to the {history} parameter.

Pros:

Storing all interactions provides the LLM with the maximum amount of information.
It is simple and intuitive to store everything.

Cons:

More tokens result in slower response times and higher costs.
Long conversations cannot be fully remembered due to the LLM’s token limit (4096 tokens for GPT-3.5-turbo).

from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.llms import OpenAI
import os

os.environ["OPEN_API_KEY"]="" # Add OPENAI API KEY
llm=OpenAI(openai_api_key=os.environ["OPEN_API_KEY"],temperature=0)

conversation_buf = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory()
)
conversation_buf("Hi AI!")

"""
Response: 

{'input': 'Hi AI!',
 'history': '',
 'response': "Hello, human! It's a pleasure to interact with you today. How are you feeling?"}
"""

from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

count_tokens(
    conversation_buf, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

"""
Response:

Spent a total of 184 tokens
"That's a fascinating topic! Large Language Models, also known as LLMs, are a type of artificial intelligence that can process and generate human language. They have been trained on vast amounts of text data and have shown impressive capabilities in natural language processing tasks. By integrating them with external knowledge, we can potentially enhance their understanding and reasoning abilities. Is there a specific area or application you are interested in exploring?"
"""

count_tokens(
    conversation_buf,
    "I just want to analyze the different possibilities. What can you think of?"
)

"""
Response:

Spent a total of 317 tokens
'There are many potential applications for integrating LLMs with external knowledge. For example, in the field of healthcare, LLMs could be used to analyze medical records and research data to assist in diagnosis and treatment recommendations. In the financial sector, LLMs could be used to analyze market trends and make investment recommendations. In education, LLMs could be used to create personalized learning experiences for students based on their individual knowledge and needs. These are just a few examples, but the possibilities are endless. Is there a specific area you would like to focus on?'
"""

count_tokens(
    conversation_buf, 
    "Which data source types could be used to give context to the model?"
)

"""
Response:

Spent a total of 459 tokens
"There are various types of data sources that could be used to give context to the model. These include structured data, such as databases and spreadsheets, which contain organized and easily searchable information. Unstructured data, such as text documents and images, could also be used to provide context. Additionally, knowledge graphs, which represent relationships between different entities, could be used to enhance the model's understanding of concepts and their connections. Other potential data sources could include social media data, sensor data, and even human input through crowdsourcing. It ultimately depends on the specific application and the type of knowledge that would be most relevant."
"""

print(conversation_buf.memory.buffer)

"""
Response:
Human: Hi AI!
AI:  Hello, human! It's a pleasure to interact with you today. How are you feeling?
Human: My interest here is to explore the potential of integrating Large Language Models with external knowledge
AI:  That's a fascinating topic! Large Language Models, also known as LLMs, are a type of artificial intelligence that can process and generate human language. They have been trained on vast amounts of text data and have shown impressive capabilities in natural language processing tasks. By integrating them with external knowledge, we can potentially enhance their understanding and reasoning abilities. Is there a specific area or application you are interested in exploring?
Human: I just want to analyze the different possibilities. What can you think of?
AI:  There are many potential applications for integrating LLMs with external knowledge. For example, in the field of healthcare, LLMs could be used to analyze medical records and research data to assist in diagnosis and treatment recommendations. In the financial sector, LLMs could be used to analyze market trends and make investment recommendations. In education, LLMs could be used to create personalized learning experiences for students based on their individual knowledge and needs. These are just a few examples, but the possibilities are endless. Is there a specific area you would like to focus on?
Human: Which data source types could be used to give context to the model?
AI:  There are various types of data sources that could be used to give context to the model. These include structured data, such as databases and spreadsheets, which contain organized and easily searchable information. Unstructured data, such as text documents and images, could also be used to provide context. Additionally, knowledge graphs, which represent relationships between different entities, could be used to enhance the model's understanding of concepts and their connections. Other potential data sources could include social media data, sensor data, and even human input through crowdsourcing. It ultimately depends on the specific application and the type of knowledge that would be most relevant.
"""

2. ConversationSummaryMemory

With ConversationBufferMemory, tokens are rapidly consumed, quickly exceeding the context window limit.
To avoid excessive token usage, we can use ConversationSummaryMemory, which, as the name suggests, summarizes the conversation history before passing it to the {history} parameter.
When using ConversationSummaryMemory, an LLM must be provided to the object since the summarization process is powered by the LLM.

Pros:

Reduces token usage for lengthy conversations.
Allows for much longer conversations.
Simple and intuitive implementation.

Cons:

Depends entirely on the LLM’s summarization ability for memorizing conversation history.
Requires additional tokens for summarization, increasing costs without limiting conversation length.

from langchain.chains.conversation.memory import ConversationSummaryMemory

conversation_sum = ConversationChain(
 llm=llm,
 memory=ConversationSummaryMemory(llm=llm)
)

count_tokens(
    conversation_sum, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

"""
Response:

Spent a total of 455 tokens
' That sounds like a fascinating topic! I am an AI designed to assist with tasks and provide information. I am always happy to interact with humans and learn from them. How are you feeling today?'
"""

count_tokens(
    conversation_sum, 
    "I just want to analyze the different possibilities. What can you think of?"
)

count_tokens(
    conversation_sum, 
    "Which data source types could be used to give context to the model?"
)

"""
Response:

Spent a total of 643 tokens
Spent a total of 856 tokens
' There are several data source types that can be used to give context to the model. Some common ones include text corpora, knowledge graphs, and databases. Text corpora are large collections of written or spoken text that can be used to train the model on language patterns and structures. Knowledge graphs are networks of interconnected concepts and entities that can provide additional context and relationships for the model to learn from. Databases are structured collections of data that can be used to provide specific information and facts for the model to use in its responses. Is there a specific type of data source you are interested in using?'
"""

count_tokens(
    conversation_sum, 
    "What is my aim again?"
)

"""
Response:

Spent a total of 791 tokens
' Your aim is to explore the potential of integrating Large Language Models with external knowledge. Is there a specific aspect of this that you would like to focus on? I am here to assist you in any way I can.'
"""

print(conversation_sum.memory.buffer)

"""
Response:

The human greets the AI and the AI responds with a friendly greeting, expressing pleasure in interacting with the human. The AI then asks the human how they are feeling and the human expresses their interest in exploring the potential of integrating Large Language Models with external knowledge. The AI responds with enthusiasm and explains its purpose as an AI designed to assist with tasks and learn from humans. The AI then asks the human how they are feeling today and the human expresses their desire to analyze different possibilities. The AI responds by offering its assistance and mentioning its vast database that can provide detailed information and insights. The human asks about their aim again and the AI clarifies that it is to explore the potential of integrating Large Language Models with external knowledge. The AI also asks if there is a specific aspect the human would like to focus on and offers its assistance.
"""

One might wonder why we should use this type of memory if the aggregate token count is greater in each call compared to the buffer method. The key advantage is that although each instance of the conversation uses more tokens, the final history is shorter. This allows for many more interactions before reaching the prompt’s maximum length, making the chatbot more robust for longer conversations. The number of tokens used can be counted (without making a call to OpenAI) using the tiktoken tokenizer.

# initialize tokenizer
import tiktoken
tokenizer = tiktoken.encoding_for_model('gpt-3.5-turbo')

# show number of tokens for the memory used by each memory type
print(
    f'Buffer memory conversation length: {len(tokenizer.encode(conversation_buf.memory.buffer))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(conversation_sum.memory.buffer))}'
)

"""
Response:

Buffer memory conversation length: 482
Summary memory conversation length: 167
"""

3. ConversationBufferWindowMemory

The ConversationBufferWindowMemory functions similarly to the “buffer memory” but adds a window to limit the number of past interactions stored. This means only a specified number of recent interactions are retained before older ones are “forgotten.”
It retains only the last K interactions, useful for maintaining a sliding window of recent exchanges and preventing the buffer from becoming too large.
In the example below, we set K=1, meaning the window will remember only the most recent interaction between the human and AI, comprising the latest human response and the latest AI response.

from langchain.chains.conversation.memory import ConversationBufferWindowMemory

conversation_bufw = ConversationChain(
 llm=llm,
 memory=ConversationBufferWindowMemory(k=1)
)

count_tokens(
    conversation_bufw, 
    "Hi, AI!"
)

count_tokens(
    conversation_bufw, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

count_tokens(
    conversation_bufw, 
    "I just want to analyze the different possibilities. What can you think of?"
)

count_tokens(
    conversation_bufw, 
    "Which data source types could be used to give context to the model?"
)

count_tokens(
    conversation_bufw, 
    "What is my aim again?"
)

"""
Response:

Spent a total of 95 tokens
Spent a total of 233 tokens
Spent a total of 318 tokens
Spent a total of 283 tokens
Spent a total of 186 tokens
'I do not have enough information to accurately determine your specific aim. Could you provide more context or details?'
"""

As we only kept the most recent interaction (k=1), the model had forgotten and could not give the correct answer.
Although this method isn’t suitable for remembering distant interactions, it is good at limiting the number of tokens being used — a number that we can increase/decrease depending on our needs.
If we only need memory of recent interactions, this is a great option.

bufw_history = conversation_bufw.memory.load_memory_variables(inputs=[])['history']
print(
    f'Buffer memory conversation length: {len(tokenizer.encode(conversation_buf.memory.buffer))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(conversation_sum.memory.buffer))}\n'
    f'Buffer window memory conversation length: {len(tokenizer.encode(bufw_history))}'
)

"""
Response: 

Buffer memory conversation length: 482
Summary memory conversation length: 167
Buffer window memory conversation length: 32
"""

4. ConversationSummaryBufferMemory

The ConversationSummaryBufferMemory combines features of both ConversationSummaryMemory and ConversationBufferWindowMemory. It summarizes the earliest interactions in a conversation while maintaining the most recent tokens up to a specified limit.
This approach ensures that even though the buffer window might miss older interactions, the summarization component captures this information.
Although it requires careful tuning to decide what to summarize and what to keep within the buffer window, ConversationSummaryBufferMemory offers significant flexibility. It is the only memory type that allows for the retention of distant interactions while also storing the most recent interactions in their raw form.

Pros:

Summarization allows for the retention of distant interactions.
The buffer ensures that recent interactions are not lost.

Cons:

Summarization can increase the token count for shorter conversations.
Storing raw interactions, even if only the most recent ones, increases the token count.

from langchain.chains import ConversationChain
conversation_with_summary = ConversationChain(
    llm=llm,
    memory=ConversationSummaryBufferMemory(llm=llm, max_token_limit=40),
    verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")

"""
Response:
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, what's up?
AI:

> Finished chain.
' Hello! I am currently running on a server in a data center in California. The temperature in the room is 72 degrees Fahrenheit and the humidity is at 45%. My processors are running at 80% capacity and I am currently processing data for various clients. Is there something specific you would like to know?'
"""

conversation_with_summary.predict(input="I'd like to learn more about Large Language Models")

"""
Response: 

'Ah, Large Language Models. They are a type of artificial intelligence that uses deep learning techniques to process and understand large amounts of text data. They are often used for natural language processing tasks such as language translation, text summarization, and question-answering. Some popular examples of Large Language Models include GPT-3, BERT, and XLNet. Is there anything else you would like to know about them?'
"""

conversation_with_summary.predict(input="How will they help humanity now and in the future?")

"""
Response:

' Large Language Models have the potential to greatly assist humanity in various ways. They can be used for tasks such as language translation, text summarization, and even generating human-like text. In the future, they could potentially be used for more complex tasks such as writing articles or even creating entire books. They can also help with data analysis and decision making by processing large amounts of text data quickly and accurately. Additionally, they have the potential to improve communication and understanding between different languages and cultures. Is there anything else you would like to know about Large Language Models?'
"""

If we notice, we did not mentioned about the LLM in this chat. It took the reference from the previous chat to answer the current query.

print(conversation_with_summary.memory.buffer)

"""
Response:

'The human greets the AI and asks what it is currently doing. The AI responds by stating its location and the current conditions of its environment. It also mentions its tasks of processing data from different sources. The AI then asks if there is anything specific the human would like to know. The human expresses interest in learning about Large Language Models, and the AI explains that they are a type of artificial intelligence used for natural language processing tasks. The AI also mentions some popular examples of Large Language Models and asks if there is anything else the human would like to know about them. The human then asks how Large Language Models will help humanity now and in the future, and the AI explains their potential to assist with tasks such as language translation, text summarization, and data analysis. The AI also mentions their potential to improve communication and understanding between different languages and cultures.'
"""

5. ConversationKnowledgeGraphMemory

It leverages a knowledge graph that identifies different entities and links them with predicates, forming (subject, predicate, object) triplets. This allows for the compression of extensive information into highly meaningful snippets, which can then be provided to the model as context.

from langchain.chains.conversation.memory import ConversationKGMemory
conversation_kg = ConversationChain(
    llm=llm, 
    memory=ConversationKGMemory(llm=llm)
)

count_tokens(
    conversation_kg, 
    "My name is human and I like mangoes!"
)
conversation_kg.memory.kg.get_triples()

"""
Response: 

Spent a total of 1253 tokens
[('human', 'human', 'has a name'), ('human', 'mangoes', 'likes')]
"""

from langchain.memory import ConversationKGMemory
llm=OpenAI(openai_api_key=os.environ["OPEN_API_KEY"],temperature=0)

memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to tom"}, {"output": "who is tom"})
memory.save_context({"input": "tom is a brother"}, {"output": "okay"})

memory.load_memory_variables({"input": "who is tom"})

"""
Response:

{'history': 'On tom: tom is a brother.'}
"""

memory.get_knowledge_triplets("his favorite color is blue")

"""
Response: 

[KnowledgeTriple(subject='tom', predicate='has a favorite color', object_='blue')]
"""

memory.get_current_entities("what's tom favorite color?")

"""
Response:

['blue']
"""

Conclusions

This concludes our guide to conversational memory for LLMs using LangChain. We’ve explored various options that enable LLMs to interact as if they were in a stateful environment, allowing them to consider and refer back to past interactions.

Conversational Memory with LangChain for LLMs

Introduction

Types of Conversational Memory

1. ConversationBufferMemory

2. ConversationSummaryMemory

3. ConversationBufferWindowMemory

4. ConversationSummaryBufferMemory

5. ConversationKnowledgeGraphMemory

Conclusions

References

Written by Nisarg Mehta