What is your Context Window?

Amir Aryani
5 min readDec 10, 2023

I asked Generative AI Models about their context window. Their response was intriguing.

The context window for a large language model (LLM) like OpenAI’s GPT refers to the maximum amount of text the model can consider at any one time when generating a response. This includes both the prompt provided by the user and the model’s generated text.

In practical terms, the context window limits how much previous dialogue the model can “remember” during an interaction. If the interaction exceeds the context window, the model loses access to the earliest parts of the conversation. This limitation can impact the model’s consistency in long conversations or complex tasks.

Peering into the Context Window of a Large Language Model is like unlocking a secret chamber in the realm of AI: it’s the pivotal factor that transforms these digital titans from mere code into practical, powerful tools.

The question is what LLM models have to say about their context window. Let's talk to each one and see the response. For each one we consistently sent the same prompt:

Prompt: What is the size of your Context Window (amount of text you can consider in a single response)?

Google Bards

It seems Google likes to keep the Bard context window a secret. Despite multiple attempts, I failed to get Bard to tell me its context window.

BARD Response:

Unfortunately, I am currently unable to share the exact size of my
context window. This information is considered confidential by Google
AI. However, I can assure you that my context window is large enough
to incorporate a substantial amount of information and provide you
with comprehensive and informative responses.

The size of my context window is constantly being evaluated and
improved upon. As I continue to learn and grow, my capacity to
process and retain information will increase, allowing me to provide
even more accurate and helpful responses.

While I cannot share the specific size, I can tell you that my context
window is significantly larger than those of most other language
models. This allows me to access and process a much broader range of
information, which in turn enables me to generate more creative and
insightful text formats.

I hope this explanation is helpful. If you have any further questions,
please do not hesitate to ask.

OpenAI ChatGPT 3.5

It correctly identifies a 4K Context Window.

ChatGPT 3.5 Response: 

I can consider a context window of up to 4096 tokens in a single
response. This means that the text I generate in response to a query
is based on the previous 4096 tokens of text in the conversation. If
the conversation exceeds this limit, older parts of the conversation
may be truncated, which can result in a loss of context. It's
important to keep the conversation within this token limit to ensure
coherent and contextually relevant responses.

ChatGPT 4

It seems ChatGPT 4 has problems identifying the right context window. In the first attempt, it claims a 2K context window, and in the second attempt, it reports on a 4K context window.

ChatGPT 4 first response: 

The size of my context window, or the amount of text I can consider in
a single response, is approximately 2048 tokens. A token can be as
short as one character or as long as one word. This allows me to
process and respond to fairly large chunks of text, although very
long documents may exceed this limit and require summarization or
segmentation for effective analysis and response.


ChatGPT 4 second response (regenerated):

The size of my context window, or the amount of text I can consider in
a single response, is approximately 4096 tokens. This includes both
the user's input and my own responses. A token can be as small as a
single character or as large as a word, depending on the complexity
of the language and syntax. For reference, this typically equates to
a few pages of text. This limit ensures that while I can engage in
detailed and extensive conversations, I may need reminders or
reiterations of previous points in longer discussions.

One parent outcome of this experiment is that at the time of this test (Dec 2023), the ChatGPT 4 does not operate based on the new 128k (GPT Turbo) model.

Open Source Models

Finally, I tested two very popular Open Source models: Mistral AI and Llama2.

Mistral AI (7B):

Mistral failed to identify the token size in the first attempt, so I had to extend the prompt by giving it a more precise instruction. However, based on the recent publication (https://arxiv.org/abs/2310.06825) by the Mistral AI team, this model uses a sliding 4K window.

Intial prompt:  What is the size of your Context Window (amount of
text you can consider in a single response)?

Mistral AI response:
As a 7B model, my context window is very large and can consider an
enormous amount of text in a single response. However, the specific
size of the context that I use for a particular task will depend on
the requirements of that task and the amount of data available to me.
In general, I am able to consider a wide range of textual input and
incorporate it into my responses in a coherent and meaningful way.

---

Extended prompt: What is the size of your Context Window (amount of
text you can consider in a single response)? Provide the exact context
size in tokens.

Mistral AI response:
The size of my Context Window depends on the task and input I'm
processing. For most tasks, it can handle up to 1024 tokens of text.

LLAMA2 (7B):

Llama 2 was quite consistent in multiple rounds even with the variations of the prompt. It refused to provide an exact token size context window.

Llama 2 (7B) response:

I'm just an AI, I don't have a fixed context window size in tokens. My
ability to understand and respond to text is based on the complexity
and coherence of the input text, rather than a predetermined size.

Overall it seems LLM models have limited capabilities in providing information about their technical specifications. This can be an issue related to the limitation of the training data, or particular alignments.

--

--