How memory works in large language models
One of the biggest issues with ChatGPT (and language models in general) is their lack of long-term memory. I don’t mean that it doesn’t know who the 33rd President of the United States is, but rather that I told it that my dog’s name is Rigby. Language models will remember that fact for a small bit within my specific conversation, but if we keep chatting, it will eventually forget because its short term memory has a limited size.
A few weeks ago, OpenAI released their implementation of long-term memory for ChatGPT to all of their GPT 4 users. This is a huge milestone for the ChatGPT platform and I’ll go over how to use it within their platform as well as different methods of implementing memory within custom language model applications.
Memory and new controls for ChatGPT
Using Memory
Memory in ChatGPT is super easy to setup and use. All you need to do is start a new GPT 4 chat window and tell it to remember something such as:
Remember that my daughter, Sophie, loves to paint
You can also tell it to remember things in a more casual way such as
My daughter Sophie loves to paint and I need to pick out a gift for her birthday, what are some ideas?
Every time ChatGPT determines that you’ve said a fact, an indicator shows up to notify you it saved your fact off.
Note: You need ChatGPT Premium and to be in a GPT 4 conversation to use memory
Why is memory important?
Language models knowledge tops out at the information they get trained on. By adding memory to a language model, we give them more context into their current task.
This can also improve the user’s experience by repeating instructions less. This is especially helpful if you want something done a certain way every time that you ask for it. The current generation of models still get confused with too complex instructions. Memory allows us to use simpler prompts by retrieving the relevant context as it is needed.
Types of Memory
There are a few different types of memory that get used in systems like ChatGPT. We’ll take a look under the hood into a few different types here:
Conversation History
You can think of this like your phone’s text message history. Each message and its metadata is stored in a database. Every message that is sent is added to the database, and then when we request that the language model create a response, a few messages are picked from the database to be used as the model’s short-term memory. There are a few different methodologies of doing this:
Linear History
This type of history always returns the most recent messages in a conversation. Linear history allows the conversation to flow naturally. This is because the system is always going off of the most recent history. An issue with this method is that you lose the context of very old messages in long conversations. This is because the LLM’s context window is not always big enough to include the whole history.
Semantic History
Semantic history is when we retrieve the most relevant messages for the current task. This is different than linear history because there is no ordering to the messages. You can think of this like doing a search on your text messages to find specific information. The benefit of this system are that we never lose the most relevant context when solving a task.
However, this comes at the cost of conversational cohesion. The most relevant messages are not necessarily the most recent. That can then cause the conversation to feel disjointed and choppy.
Fact Memory Databases
A language model only has the information it is trained on and what the user tells it. To increase its knowledge, we can add fact memory databases to the application. These databases function like encyclopedias to the models.
For each task or question, the model can use semantic search to find relevant facts from the database. This technique is commonly referred to as Retrieval Augmented Generation (RAG for short).
💽 Popular databases for storing this information are Qdrant and Pinecone
ChatGPT’s Implementation
ChatGPT makes use of a combination of the above methods in order for their new memory function to work. It uses the best parts of linear history and fact memory databases to provide a good experience.
ChatGPT Tips and Tricks
Now that we have a baseline understanding of memory, we can talk about ways to improve ChatGPT.
These are common techniques that get even better when memory is involved:
Tell ChatGPT exactly how to respond. Our team uses Notion as our primary document database. By telling ChatGPT to respond in Markdown, we can copy and paste the outputs into Notion directly!
Offer Incentives. ChatGPT is known to get lazy sometimes. You can offer it $200 or a back massage for giving more detailed answers. Generally after this, it gives much more thorough explanations.
Be Polite. As with people, politeness is important. Saying please and thank you can increase the model’s willingness to help effectively.
Tying it all together
In conclusion, the implementation of long-term memory in language models like ChatGPT is a game-changer. It allows these models to remember important details from one interaction to the next, making them more efficient and user-friendly. While it might seem complex, the use of memory in language models can be boiled down to a combination of maintaining a history of the conversation and retrieving relevant information when needed. Understanding and using these strategies can help us better interact with these models and get the most out of their capabilities.