Context Windows: The Short-term Memory of Large Language Models

Navigating the Basics and Understanding Context Windows for ChatGPT and LLMs

3 min readNov 3, 2023

In the world of Large Language Models (LLMs), the ability for models to understand and mimic human language is a fundamental goal. Central to this goal is the concept of “context windows.” These windows function as the lens to the world, allowing models to perceive and interpret text, which is essential for generating coherent responses to a user query.

A common way to look at context windows is to imagine you’re reading a book, but with a limitation — you can only see a few sentences at a time through a small window. Every time you move forward, previous sentences are completely forgotten, replaced by new ones. This scenario encapsulates the essence of how Context Windows operate in Large Language Models, which are a type of AI specialized in understanding and generating text.

The span of text that a LLM can “see” at any given moment is its Context Window. The larger this window, the more text the model can consider, leading to a deeper understanding and more coherent responses. For instance, a larger window would enable the AI to grasp the broader narrative of a conversation or a story, much like how we, humans, understand the context better when we read a whole paragraph instead of isolated sentences. In the realm of LLMs, text is broken down into units called tokens, which can be as small as a character or as long as a word.

The number of tokens a model can process at once is defined by the size of its context window. Thus, a larger context window allows for more tokens to be considered, providing a richer understanding of the text and enabling more coherent and contextually relevant responses. GPT-4 for example can handle either 8,000 tokens or 32,000 tokens. Some older models were only able to process 2 or 3 thousand tokens at once which was a main limitation. Newer models on the cutting edge have been able to work with up to 100,000 tokens at once. Learn more about tokens here!

However, expanding the Context Window is a double-edged sword. On one side, it enhances the model’s comprehension and ability to generate relevant text. On the other, it significantly increases the computational burden, requiring more processing power and time. This trade-off is a critical consideration in the development of efficient and effective AI systems. Models also sometimes lose track of context when their window gets larger. As there is more info to keep in mind, it gets harder to reference more and more context.

Context Windows are not just relevant to understanding text. They are also important to how models interpret various forms of data, be it images, videos, or sounds. In each case, the window defines the amount of data that can be processed at once, impacting its ability to recognize patterns, make predictions, or generate new data. This assumes that the models convert those various formats into their token representation which is the common approach taken by most multi-modal models today.

Keymate personalizes the content windows by enabling users to add data to their personal knowledge base. Users can populate this knowledge base with information that they want the model to interact with, which can be recalled and utilized during responses. This feature sharpens the model’s responses, making user interactions more relevant and personalized.

Quick interruption, I wrote this article as part of my Dev Rel position at Keymate.AI and wanted to share it here too. If this interests you, I highly recommend that you check out their LinkedIn for more article about Generative AI and ChatGPT topics!

As we delve deeper into the era of AI, understanding the significance and workings of context windows is essential so that you can know how to navigate the limitations of current AI systems.

Context Windows: The Short-term Memory of Large Language Models

Navigating the Basics and Understanding Context Windows for ChatGPT and LLMs

Written by Chandler K