Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

Extending Context Length in Large Language Models

9 min readOct 15, 2023

--

Image by the author. (AI generated Llamas)

Context length refers to the maximum number of tokens the model can remember when generating text. A longer context window allows the model to understand long-range dependencies in text better. Models with longer contexts can build connections between ideas far apart in the text, generating more globally coherent outputs.

During training, the model processes the text data in chunks or fixed-length windows. Models need to be trained on lengthy texts to actually leverage long contexts. Training sequences must contain documents, books, articles, etc., with thousands of tokens.
The length of training data sets a limit on usable context length.

So, why don’t we train models on longer sequences?

Not so fast.

Increasing context length increases the number of possible token combinations the model must learn to predict accurately.
This enables more robust long-range modeling but also require more memory and processing power, leading to higher training costs.

Without any optimization, computation scales quadratically with context length — meaning that a 4096 token model will need 64 times more…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Donato Riccio
Donato Riccio

Written by Donato Riccio

AI Engineer specialized in Large Language Models.