Sitemap
AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Context Rot: How Increasing Input Tokens Impacts LLM Performance

8 min readSep 11, 2025

--

Large Language Models (LLMs) are typically presumed to process context uniformly — that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this blog, we will evaluate different LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Let’s see how models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.

Table Of Contents

  • How Do Transformers Work?
  • Problems With Benchmarking
  • Needle in a Haystack Extension
  • Haystack Structure
  • Other Experiments
  • Conclusion
Press enter or click to view image in full size

How Do Transformers Work?

Transformers have dominated the AI landscape since their introduction in 2017’s “Attention Is All You Need” paper. There are many reasons as to why they work so great, but if I have to explain it in simple terms. It is their ability to route information in a weighted manner.

  • Through content-based, dynamic

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Responses (2)