[Course Notes] LangChain for LLM Application Development: Part 3

Chanan
6 min readMar 4, 2024

--

LangChain for LLM Application Development: Memory

LangChain for LLM Application Development by DeepLearning.AI

Table of Contents

  1. Introduction
  2. Models, Prompts, and Parsers
  3. Memory (this part)
  4. Chains
  5. Question and Answer
  6. Evaluation
  7. Agents

Memory

When interacting with language models, such as chatbots, they naturally lack memory of previous conversations, posing a challenge for maintaining conversational flow. In this section, we’ll delve into different memory types that address this issue by enabling models to remember past interactions and incorporate them into ongoing conversations.

Memory Types:

  • ConversationBufferMemory: Stores messages and extracts them for reference.
  • ConversationBufferWindowMemory: Maintains a list of recent interactions, using only the last K interactions.
  • ConversationTokenBufferMemory: Keeps recent interactions in memory based on token length rather than the number of interactions.
  • ConversationSummaryMemory: Generates a summary of the conversation history.

Additional Memory Types:

  • Vector Data Memory: Stores text in a vector database and retrieves relevant blocks of text as needed.
  • Entity Memories: Utilizes LLMs to remember details about specific entities.

Combining multiple memory types is possible; for instance, using Conversation Memory alongside Entity Memory to recall specific individuals. Additionally, conventional databases like key-value stores or SQL can be used for storing conversational data.

ConversationBufferMemory

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferMemory()
conversation = ConversationChain(
llm=llm,
memory = memory,
verbose=True
)
conversation.predict(input="Hi, my name is Andrew")

Output:

> Entering new ConversationChain chain…
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Chanan
AI:

> Finished chain.
“Hello Chanan, it’s nice to meet you. My name is AI. How can I assist you today?”

conversation.predict(input="What is 1+1?")

Output:

> Entering new ConversationChain chain…
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Chanan
AI: Hello Chanan, it’s nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI:

> Finished chain.
‘The answer to 1+1 is 2.’

conversation.predict(input="What is my name?")

Output:

> Entering new ConversationChain chain…
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Chanan
AI: Hello Chanan, it’s nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI: The answer to 1+1 is 2.
Human: What is my name?
AI:

> Finished chain.
‘Your name is Chanan, as you mentioned earlier.’

As demonstrated in the example, the ConversationBufferMemory stores conversations without a limit, which could potentially lead to issues down the line. Without constraints on memory usage, there’s a risk of resource depletion or performance degradation over time.

ConversationBufferWindowMemory

With this memory type, we can set the limit for storing the conversation in the memory. In the example below, we use k=1, which means we will store only one conversation.

from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=1)
memory.save_context({"input": "Hi"},
{"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
{"output": "Cool"})
memory.load_memory_variables({})

Output:

{‘history’: ‘Human: Not much, just hanging\nAI: Cool’}

llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
llm=llm,
memory = memory,
verbose=False
)
print('Output 1:',conversation.predict(input="Hi, my name is Andrew"))
print('Output 2:',conversation.predict(input="What is 1+1?"))
print('Output 3:',conversation.predict(input="What is my name?"))

Output:

“Output 1: Hello Andrew, it’s nice to meet you. My name is AI. How can I assist you today?”ConversationTokenBufferMemory
‘Output 2: The answer to 1+1 is 2.’
“Output 3: I’m sorry, I don’t have access to that information. Could you please tell me your name?”

ConversationTokenBufferMemory

The ConversationTokenBufferMemory limits the number of tokens saved, making it a more cost-effective option for applications utilizing LLMs, as many pricing models are token-based.

Example: set max_token_limit=50:

from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
{"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
{"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"},
{"output": "Charming!"})
memory.load_memory_variables({})

Output:

{‘history’: ‘AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!’}

ConversationSummaryMemory

from langchain.memory import ConversationSummaryBufferMemory
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
{"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"},
{"output": f"{schedule}"})
memory.load_memory_variables({})

Output:

{‘history’: “System: The human and AI engage in small talk before discussing the day’s schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments.”}

In the provided example, notice that the memory has a token limit set to 100. However, instead of simply truncating excess tokens, ConversationSummaryMemory employs a summarization technique using LLMs before storing the text in the buffer. This approach ensures efficient memory utilization while preserving the essence of the conversation.

conversation = ConversationChain(
llm=llm,
memory = memory,
verbose=True
)
conversation.predict(input="What would be a good demo to show?")

Output:

> Entering new ConversationChain chain…
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI engage in small talk before discussing the day’s schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments.
Human: What would be a good demo to show?
AI:

> Finished chain.
“Based on the customer’s interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user’s preferences. Additionally, we could highlight our AI’s ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience.”

memory.load_memory_variables({})

Output:

{‘history’: “System: The human and AI engage in small talk before discussing the day’s schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments. The human asks what would be a good demo to show.\nAI: Based on the customer’s interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user’s preferences. Additionally, we could highlight our AI’s ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience.”}

And that wraps up our dive into memory types with LangChain! Next up, we’ll explore Chains and how they tie everything together. Stay tuned! 😄

--

--

Chanan

Book and Coffee enthusiast | Data Scientist | Python, NLP, AI