Member-only story
What If Long Context Is Not a Memory Problem but a Learning One?
Why teaching language models to learn at inference time may matter more than building ever larger context windows
TL;DR
Most approaches to long context in language models focus on expanding attention and memory. This paper takes a different path. It treats long context as a continual learning problem, where the model updates itself while reading. The result is constant time inference with competitive performance and a shift in how we think about memory, efficiency, and adaptation in AI systems.
Introduction
Imagine reading a thousand-page novel where your brain doesn’t just store the story but adjusts its understanding on the fly as the narrative unfolds. Most current large language models (LLMs) try to mimic memory by simply expanding attention windows as context length grows. But this strategy quickly becomes impractical because attention costs scale with the length of the context itself. What if, instead of growing bigger windows, we made the model smarter about learning from the context itself at runtime?
That is the radical shift proposed in End-to-End Test-Time Training for Long Context, a collaborative project from researchers across Stanford, UC Berkeley, NVIDIA, and more. Instead of designing ever-wider context architectures, this work treats long-context modeling as a continual learning problem. In other words, the model doesn’t just read the context; it learns from it as it goes.

