Adding Memory & Agent interaction into the “Auto-Analyst”

A technical guide to adding memory and making agents interact with the user & other agents

Arslan Shahid
FireBird Technologies
9 min readAug 6, 2024

--

Image by Author

A few weeks ago, I wrote a blog about creating your own AI analyst, which I called “Auto-Analyst.” In that first version (link below), the AI didn’t have any memory, so it couldn’t keep track of past interactions or learn from them (both user and agent interaction). In this new blog, I’m going to show you how I added memory to the AI system. I’ll walk you through how I solved this challenge, and you’ll see how you can do the same for your projects.

What is “memory”, in the context of AI agents?

Human memory helps us recall past events, while for an AI agent, memory involves keeping track of previous responses and inputs. A simple but flawed approach is to store past responses in a list and use them as a retriever for the agent. This method can quickly fill up the context window with irrelevant information, leading to less accurate responses.

Many frameworks offer solutions for managing memory in language models, such as mem0. You might want to explore these options, but if you’re interested in building your own memory system, keep reading.

Memory can be categorized in various ways. For instance, you could have separate memory systems for each user, each agent, or each type of request. This approach helps keep memory organized and relevant to specific contexts or interactions.

How to design memory for your AI agentic system?

In order to decide the precise design for how your memory pipeline should work you need to consider these factors:

  1. Latency: adding memory adds more API calls, therefore can slow down your system
  2. Cost: Large use-cases like memory for every user in a multi-million-user product will be expensive
  3. User Needs: Would memory be necessary to make the UX better? If yes, how much should you retain? And what “things” do you need to retain. A personal journal AI app would probably need longer memory. A travel-based AI app would perhaps need memory only when a user is visiting a particular area, etc.

Recap of the Auto-Analyst

To explain the design choices I made, let me start by giving a brief overview of Auto-Analyst, the system for which I developed the memory component. You don’t need to read the previous post to follow along!

Design of the Auto-Analyst

There are 6 agents in total in the system, the planner agent delegates the user query to the 4 analyst agents (stat, data-preprocessing, visualization and machine learning ) and an agent which corrects the analyst agents mistakes and appends the output into one script to be executed.

For the full coding implementation of the agents read below:

Reasons for Adding Memory to the System:

  1. Add Interaction: By storing recent agent responses and user inputs, memory allows individual agents to interact with each other and with the user.
  2. Prevent Common Errors: Agents often generate code with recurring mistakes (e.g., the preprocessing agent failing to categorize numerical and categorical columns). By retaining information about these common errors and their fixes in memory, the system will work better.
  3. Answer follow-up queries: Without memory, the agentic system cannot answer follow-up questions.

Want an expert to help you with designing AI agentic systems? Please feel free to reach out for help:
https://form.jotform.com/242161534677459

Memory Architecture

To address the three problems listed above, I decided to use two types of memory:

  1. Short-term Memory: This helps agents interact with each other and handle follow-up questions.
  2. Error-Fix Memory: Since agents often encounter common issues with pandas, statsmodels, and scikit-learn when writing code, I’ve created a history of fixes based on user experiences. This allows me to develop an auto-fix agent to handle these common errors effectively.

Short-term memory

Short-term memory diagram

As shown in the diagram, the agent’s response goes to a ‘summarizer,’ which adds it to a list/retriever containing previous interactions. Meanwhile, the agent gets the latest interactions in the prompt to help it understand the user’s question better.

The advantage of using a summarizer is that it helps manage long agent responses, which include both Python code and commentary. While it may add some latency, it reduces issues with saving token costs (you can use a cheaper model to summarize) and context windows.

You can adjust the summary size and decide how many previous interactions to include in the agent's prompt based on your desired user experience.

# I used dspy to implement this, but you can easily abstract this for LangChain
# llama-index
import dspy

# This is the dspy signature for the summarizer,
# You define the prompt between """ """ after function header
# then you define inputs/outputs and give them a description
class memory_summarize_agent(dspy.Signature):
"""
You are an AI agent which helps summarize other agent responses and user-input.
Keep these instructions in mind:

- Analyze the provided text.
- Present the extracted details in bullet points:
- Agent: Include agent name
- User Goal: The user query/goal summarized, with only important information retained
- Stack_Used: All python packages used
- Actions: What actions did the agent_name take, summarize them like "Agent visualized a line chart using plotly"

"""
agent_response = dspy.InputField(desc="What the agents output, commentary and code")
user_goal = dspy.InputField(desc= "User query or intended goal")
summary = dspy.OutputField(desc ="The summary generated in the format requested")

# In dspy you add signatures to modules to build a LLM program
# I am using the pre-defined ChainOfThough module in DSPy
summarizer = dspy.ChainOfThought(memory_summarize_agent)

# Below is the example of giving the summarizer some text to summarize

# suppose we give this query to the ML agent in the auto-analyst system
# The dataset is about layoffs
user_query = "What is happening to layoffs?"

# Suppose this is the response we got from the ML agent or any other agent
# It is quite long and approximately 600 tokens
agent_response ="""Agent_Name: sk_learn_agent

The analysis will include:
1. Data cleaning and preprocessing to handle any missing values.
2. Exploratory data analysis (EDA) to visualize the trends in layoffs over time.
3. Grouping the data by relevant categories (like state and industry) to see how layoffs are distributed.
4. Plotting the results to provide a clear visual representation of the findings.

```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming df_1 is the DataFrame containing the dataset
df = df_1.copy()

# Step 1: Data Cleaning
# Check for missing values
print(df.isnull().sum())

# Drop rows with missing values in critical columns
df.dropna(subset=['Number_of_Workers', 'Received_Date', 'Layoff_Closure'], inplace=True)

# Step 2: Convert 'Received_Date' to datetime
df['Received_Date'] = pd.to_datetime(df['Received_Date'])

# Step 3: Exploratory Data Analysis
# 3.1: Trend of layoffs over time
layoffs_over_time = df.groupby(df['Received_Date'].dt.to_period('M')).size()
layoffs_over_time.plot(kind='line', figsize=(12, 6), title='Monthly Layoffs Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Layoffs')
plt.grid()
plt.show()

# 3.2: Distribution of layoffs by state
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='State', order=df['State'].value_counts().index)
plt.title('Distribution of Layoffs by State')
plt.xticks(rotation=45)
plt.ylabel('Number of Layoffs')
plt.show()

# 3.3: Distribution of layoffs by industry
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='Industry', order=df['Industry'].value_counts().index)
plt.title('Distribution of Layoffs by Industry')
plt.xticks(rotation=45)
plt.ylabel('Number of Layoffs')
plt.show()

# Step 4: Summary statistics
summary_stats = df.describe(include='all')
print(summary_stats)
```

This code will help us understand the trends and distributions of layoffs in the dataset, providing insights into how layoffs are changing over time and across different states and industries.
"""

# putting the query + agent_response in the summarizer

response=summarizer(user_goal = user_query, agent_response=agent_response)

# Below you can find the image of the summarizer

Here is what the above code generates

response from the summarizer about 100 tokens

After using the summarizer, you can fit approximately 100 previous interactions in the context window of ChatGPT 4o-mini. So this would significantly enhance the user experience in applications where users often reference past interactions (like the auto-analyst).

These summaries can help understand app-user behavior or could be used to build a longer memory system on a user-by-user basis.

Like the way I explain things? Need someone who can build AI applications for you and also communicate the technicalities to other stake-holders? Request a free introductory call using the link below!
https://form.jotform.com/242161534677459

How the error fixing memory would help

After testing the auto-analyst with my peers and friends, I found that the code generated by the LLM often failed. Although implementing an LLM auto-fix resolved some errors, others persisted. Many users reported that even after fixing problems in the prompt, the system would repeat the same mistakes. To address this, I decided to create a separate error-knowledge base. It’s called “memory” because it can track a particular user’s error history. Since users often have unique data that leads to specific errors, this system will work both as a global repository of code fixes and at the individual user level.

Like in the short-term memory example, there would be a LLM that would “summarise” the code-fix, below is the prompt for the summarise agent.


#Using DSPy again
class error_memory_agent(dspy.Signature):
"""
Prompt for error_summarize Agent:

Agent Name: error_summarize

Purpose: To generate a concise summary of an error in Python code and provide a clear correction, along with relevant metadata and user query information. This summary will help in understanding the error and applying the correction.

Input Data:

Incorrect Python Code: (A snippet of code that produced an error)
Meta Data:
Agent Name: (Name of the agent that processed the code)
Agent Version: (Version of the agent that processed the code)
Timestamp: (When the error occurred)
User Query: (The query or task that led to the incorrect code execution)
Human-Defined Correction: (The corrected code or solution provided by a human expert)
Processing Instructions:

Error Analysis:

Analyze the incorrect Python code to determine the type of error and its cause.
Summary Creation:

Generate a brief summary of the error, highlighting the key issue in the code.
Provide a short explanation of the correction that resolves the issue.
Output Formatting:

Format the summary to include:
Error Summary: A concise description of the error.
Correction: A brief explanation of how to correct the error.
Integration:

Ensure the summary is clear and informative for future reference.
Output Data:

Error Summary:
Error Summary: (Brief description of the error)
Correction: (Concise explanation of the fix)
Example Output:

Error Summary: The IndexError occurred because the code attempted to access an element at an index that is out of range for the list.
Correction: Ensure the index is within the bounds of the list. For example, use if index < len(my_list): to check the index before accessing the list element.
"""
incorrect_code = dspy.InputField(desc="Error causing code")
error_metadata = dspy.InputField(desc="The description of the error generated, with user/agent information for context")
correction = dspy.InputField(desc="Correction suggested by AI or done manually by human")
summary = dspy.OutputField(desc="The description which must contain information about the error and how to correct it")

You can load these summaries into any vector store like pinecone, qdrant or chromadb; you can retrieve the relevant context when needed.

Conclusion

On the surface, memory might seem like a live retriever that updates based on user or system behavior. However, implementing it involves additional steps. This article aims to break down the concept into different categories and highlight the technical considerations you should keep in mind. The auto-analyst implementation is designed to share insights from my experience with memory, especially when your use case is unique and you cannot directly use pre-packaged memory retrievers.

Thank you for reading. Please follow me and FireBird Technologies to stay updated on our internal learnings and projects.

--

--

Arslan Shahid
FireBird Technologies

Life has the Markov property, the future is independent of the past, given the present