Exploring LangChain: four powerful memory types for conversational AI

Published in

Indicium Engineering

6 min readJul 1, 2024

In the rapidly evolving world of artificial intelligence, creating more human-like and context-aware interactions has become the gold standard. But how can we create an AI that not only remembers your previous discussions but also understands the context of your ongoing dialogue, making interactions more fluid and natural? LLMs are stateless by nature, and do not remember the conversation you’ve had so far.

This is where Langchain comes in.

LangChain provides a sophisticated framework for managing conversational context. Whether you’re developing a customer support chatbot, a virtual assistant, or any AI-driven conversational tool, incorporating memory capabilities can significantly improve user experience, making interactions more engaging and effective.

The chat memory feature significantly improves the user experience by providing personalization, continuity, and efficiency. By remembering past interactions, user preferences, and context, chatbots can tailor responses, maintain conversation flow, and anticipate user needs more effectively. This creates a smoother, more engaging experience for the user.

Additionally, chatbots with memory can engage in multiturn conversations, offer more accurate solutions, and continuously improve through iterative learning. It contributes to building stronger connections between users and chatbots, leading to higher satisfaction and engagement.

In this blogpost we will be talking about four Langchain memory types, with code snippets to show them in action. Join us as we explore this tool LangChain provides, that is transforming how machines understand and engage with humans.

ConversationBufferMemory

At its core, ConversationBufferMemory is the most basic memory type that allows AI models to retain a complete running history of interactions. This memory buffer captures and stores the raw dialogue data in its entirety. By maintaining this context, the AI can generate more accurate and relevant responses.

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferMemory()
conversation = ConversationChain(
llm=llm,
memory = memory,
verbose=False
)
conversation.predict(input="Hello, my name is Izabel!")
conversation.predict(input="Whats the weather today in Florianópolis?")
conversation.predict(input="Whats my name?")

# Human: Hello, my name is Izabel!
# AI: Hello Izabel! It's nice to meet you. My name is OpenAI. How can I assist you today?
# Human: Whats the weather today in Florianópolis?
# AI: According to my sources, the current weather in Florianópolis is mostly cloudy with a high of 25 degrees Celsius and a low of 19 degrees Celsius. There is a chance of scattered thunderstorms throughout the day. Would you like me to provide more detailed information or a forecast for the upcoming days?
# Human: Whats my name?
# AI: Your name is Izabel, as you mentioned earlier.

With ConversationBufferMemory, the AI can provide responses that are contextually relevant, reducing misunderstandings and enhancing the quality of the conversation.

As the conversation grows, the memory requirements increase significantly, leading to higher costs. This is because sending a large number of tokens to the language model (which typically charges based on the number of tokens processed) becomes more expensive.

ConversationBufferWindowMemory

ConversationBufferWindowMemory is an option to solve that problem. Unlike ConversationBufferMemory, which stores the entire conversation history, this feature maintains a dynamic window of recent interactions.

It works by retaining a “window” of recent dialogue exchanges, defined by the “k” parameter. This window shifts with each new interaction, discarding the oldest entries to make room for new ones. This allows the AI to reference recent context while keeping memory usage and processing costs in check.

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=1)
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
llm=llm,
memory = memory,
verbose=False
)
conversation.predict(input="Hello, my name is Izabel!")
# Human: Hello, my name is Izabel!
# AI: Hello Izabel! It's nice to meet you. My name is OpenAI. How can I assist you today?
conversation.predict(input="Whats the weather today in Florianópolis?")
# Human: Whats the weather today in Florianópolis?
# AI: According to my sources, the current weather in Florianópolis is mostly cloudy with a high of 25 degrees Celsius and a low of 19 degrees Celsius. There is a chance of scattered thunderstorms throughout the day. Would you like me to provide more detailed information or a forecast for the upcoming days?
conversation.predict(input="Whats my name?")
# Human: Whats my name?
# AI: I'm sorry, but I don't have access to that information. Could you please tell me your name?

As we can see from the example above, the AI only remembers the latest interaction, because the parameter “k” is set to 1, so it can’t access older information.

Limiting the amount of memory reduces the number of tokens sent to the language model, which helps manage and reduce operational costs. It also ensures the system remains efficient and responsive, even as the conversation grows, making it ideal for applications with high user engagement.

ConversationTokenBufferMemory

The ConversationTokenBufferMemory operates by maintaining a buffer of recent tokens within a predefined limit, set on the “max_token_limit” parameter. This buffer continuously updates with each new interaction, discarding the oldest tokens to accommodate new ones.

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationTokenBufferMemory
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
conversation.predict(input="Hello, my name is Izabel!")
# Human: Hello, my name is Izabel!
# AI: Hello Izabel! It's nice to meet you. My name is OpenAI. How can I assist you today?
conversation.predict(input="I'm going to a party tonight")
# Human: I'm going to a party tonight
AI: That sounds like fun! Do you need any help with party planning or outfit ideas? I can provide suggestions based on your preferences and the latest fashion trends.
conversation.predict(input="Whats my name?")
# Human: Whats my name?
AI: I'm sorry, I don't have access to that information. Would you like me to call you by a specific name or nickname?

As shown on the example above, the AI cannot access the information asked because it was replaced by newer interactions, following the token limitation set to 50.

This approach also helps keep the reduced costs and steady efficiency and responsiveness when dealing with bigger, more complex applications.

ConversationSummaryMemory

The last memory type we are gonna see is the ConversationSummaryMemory. It optimizes memory management by summarizing past interactions. Instead of storing every detail of a conversation, this tool generates summaries of the conversation over time.

These summaries replace the detailed logs of the conversation, allowing the AI to reference the essential context without needing to process large volumes of tokens. However, it’s worth noting that one of its downsides is that you cannot choose what’s stored in the summarization, since it’s done by the LLM chosen.

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryBufferMemory
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=50)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
conversation.predict(input="Hello, my name is Izabel!")
conversation.predict(input="I'm going to a party tonight")
# System: The human introduces themselves as Izabel and the AI, OpenAI, greets them. OpenAI asks how they can assist Izabel and she mentions that she is going to a party tonight.
AI: That sounds like fun! Do you need any help with party planning or outfit ideas? I can provide suggestions based on your preferences and the latest fashion trends.

Conclusion

With those four main memory types and so many others, LangChain provides tools to optimize both performance and cost. Each memory type addresses specific challenges, enabling developers to tailor their AI applications to meet the unique needs of their users.

As you integrate these memory solutions into your projects, you will not only streamline your AI’s performance but also unlock new possibilities for creating more engaging and intelligent conversational experiences.

Acknowledgment

This blog post was inspired by the insights and knowledge gained from the LangChain for LLM Application Development course offered by deeplearning.ai.

This blogpost is also accompanied by a GitHub repository, where you can find the scripts showcased here and interact with them!

https://github.com/belbarros/langchain-memories