Mastering Token Limits and Memory in ChatGPT and other Large Language Models

A Guide for the Everyday User

Russell Kohn
5 min readMar 21



Large Language Models (LLMs) like ChatGPT have transformed the AI landscape, and understanding their intricacies is essential to harness their full potential. This short article will focus on token limits and memory in LLMs. By the end of this article, you will understand the importance of token limits, the concept of memory in LLMs, and how to manage conversations effectively within these constraints, both from an interactive interface and via an API programmatically. Let’s start by discussing a scenario where a user interacts with an LLM and experiences the effects of token limits and memory on the conversation.

Prompt: “A young adult with glasses and brown hair, wearing a casual sweater, sitting in a cozy room with a warm ambiance, interacting with an AI language model on a sleek laptop, with speech bubbles containing tokens floating around them. The style of the rendering is modern and engaging.” Prompt by Russ Kohn & GPT4. Rendering by MidJourney.

Understanding Tokens and Token Limits

Tokens & Token Counts

Tokens are the building blocks of text in LLMs, ranging from one character to one word in length. For example, the phrase “ChatGPT is amazing!” consists of 6 tokens: [“Chat”, “G”, “PT”, “ is”, “ amazing”, “!”]. Here’s a more complex example: “AI is fun (and challenging)!” consists of 7 tokens: [“AI”, “ is”, “ fun”, “ (“, “and”, “ challenging”, “)!”].

Screen shot of the OpenAI tokenizer utility demonstrating how token counting works. Credit: Russ Kohn & OpenAI.

Note: the tool used in the image above is from OpenAI and is available at

Token Limits

Token limits in model implementations restrict the number of tokens processed in a single interaction to ensure efficient performance. For example, ChatGPT 3 has a 4096-token limit, GPT4 (8K) has an 8000-token limit and GPT4 (32K) has a 32000-token limit.

Memory, and Conversation History

Token counts play a significant role in shaping an LLM’s memory and conversation history. Think of it as having a conversation with a friend who can remember the last few minutes of your chat, using token counts to maintain context and ensure a smooth dialogue. However, this limited memory has implications on user interactions, such as the need to repeat crucial information to maintain context .

Context Matters!

The Context Window starts from your current prompt, and goes back in history until the token count is exceeded. Everything prior to that never happened, as far as the LLM is concerned. When a conversation length is longer than the token limit, the context window shifts, potentially losing crucial content from earlier in the conversation. To overcome this limitation, users can employ different techniques, such as periodically repeating important information or using more advanced strategies.

Notice how the response from the LLM might be different if it didn’t have knowledge of the first part of the sentence.

The Chat Experience: Prompts, Completions, and Token Limits

Engaging with an LLM involves a dynamic exchange of prompts (user inputs) and completions (model-generated responses). For example, when you ask, “What is the capital of France?” (prompt), the LLM responds with “The capital of France is Paris.” (completion). To optimize the chat experience within token limits, it’s essential to balance the prompts and completions. If a conversation approaches the token limit, you may need to shorten or truncate the text to maintain context and ensure a seamless interaction with the LLM.

Exceeding the Token Limit and Potential Solutions

Exceeding the token limit can lead to incomplete or nonsensical responses, as the LLM loses vital context. Imagine asking about the Eiffel Tower and receiving a response about the Leaning Tower of Pisa because the context window shifted. To handle token limit issues, you can truncate, omit, or rephrase text to fit within the limit. A good strategy is to wrap up your current conversation by creating a summary before you run into the limit, then start your next conversation with that summary. Another tactic is to write very long prompts with the idea that you will attempt a one-shot conversation: give the AI everything you know and have it make one response. If you use a third party prompt manager, it might also help you with managing your conversations, track token limits and manage costs.

Diagram illustrating strategies for handling token limit issues in LLMs, including truncating text, summarizing conversations, and writing long one-shot prompts.
Strategies for handling token limit issues in Large Language Models: truncating or omitting text, summarizing conversations, and writing long one-shot prompts.

Real-World Application: Managing Token Limits in Content Creation

Applying the strategies discussed in this article, I experienced firsthand the benefits of managing token limits and context in LLMs. While writing this article, I encountered the token limit issues discussed earlier. For those interested, I used a custom FileMaker Pro solution with the OpenAI API, leveraging both GPT-3.5-turbo (ChatGPT) and GPT-4 (8k) models available to ChatGPT-Plus subscribers. I began by crafting prompts to create a story treatment and outline, then proceeded with revisions. As the conversation exceeded GPT-3.5-turbo’s token limit, I switched to GPT-4 and summarized the objectives to start a new conversation. Utilizing a prompt manager helped me organize my prompts by project and work efficiently without relying on the OpenAI website. This approach also facilitated separating “meta” prompts, such as title and SEO optimization, from those aiding the writing process. Throughout, I carefully reviewed and edited the generated content to ensure quality and coherence. This practical example demonstrates the effectiveness of using summaries, switching models, and employing a prompt manager in managing token limits. By understanding and applying these strategies, users can harness the full potential of LLMs like ChatGPT in various applications, such as content creation and analysis.


Understanding token limits and memory in Large Language Models is essential for effectively utilizing their capabilities in various applications, such as content creation, chatbots, and virtual assistants. By mastering the concepts of tokens, token counts, conversation history, and context management, you can optimize your interactions with LLMs like ChatGPT. Implementing the practical strategies discussed in this guide, including managing token limits and leveraging prompt managers, empowers you to navigate the world of AI with confidence. With this knowledge, you are well-equipped to explore the exciting possibilities that the future of AI has to offer and unlock the full potential of LLMs in technology, business, and productivity applications.

by Russ Kohn

According to GPT-4: “The author is a seasoned software developer and web services architect with over 30 years of experience. They specialize in API integration, cloud development, FileMaker Pro, project management, and public speaking. Having implemented innovative solutions for diverse industries, the author is an internationally recognized expert in api integration and FileMaker Pro.” I’m now also doing deep dives into how AI can be safely used in the real world, which for completion models means learning how to write effective prompts and back end filters…more about that some other day.