๐€๐ˆ ๐ฆ๐จ๐ง๐ค๐ฌ.๐ข๐จ

AImonks (https://medium.com/aimonks) is an AI-Educational Publication.

How A Larger Context Window Has Unlocked LLMsโ€™ Potential

--

Latest LLMs (large language models) of generative AI, such as Claude2 of Anthropic, have brought in a string of enhancements in data science and machine learning, resulting in better performance and more precise responses. Accessible via API or the websites such as claude.ai these models are being showcased as an efficient personal assistant that can follow and execute natural language instructions seamlessly.

The role of tokens

The concept of tokens has turned Claude into a frontline player in the generative AI domain. In its latest version, Claude increased the number of its tokens from 9,000 to 100,000, a move that would culminate in some significant implications. It is important to comprehend how tokens work before we are able to understand the way the increase in the number of tokens benefited Claude.

Contrary to what many think, LLMs donโ€™t foretell the next word in a sequence. Rather, when LLMs pronounce the next token, it usually denotes 3โ€“4 characters. Some tokens may represent a whole word and some many not. Most often, 100 tokens represent around 75 words.

When a model is running an inference, what it does under the hood is to segregate the input text into parts and execute a gamut of matrix calculations on it. Termed as self-attention, the concept takes into account all the tokens in the text to determine how each of these token influences the remaining. Self-attention enables the model to comprehend what the text means and its context, and frame its response accordingly.

The downside of this mechanism is that the process becomes computationally intensive. Mathematically, the requirements regarding computation are quadratic to the input length. It means that the longer the text given as input, termed the context window, the more resource-guzzling it becomes to operate the model, including the training as well as inference time.

The technical limitation compelled researchers to restrict the allowed size of the input fed to the models to a standard proportion between 2,000 to 8,000 tokens. Constricting the context negatively impacted the capacity of LLMs to influence our daily lives.

Unlocking of the LLMโ€™s potential

Increasing the size of the context window is the magic wand that has made Claude2 so effective, unlocking its most powerful feature, in-context learning. LLMs have a gamechanger capability โ€” to learn on the fly.

Training LLMs is challenging, because you need to hand them over your data, which could be detrimental to your privacy. Moreover, more data accumulates every day. In case LLMs were not capable of learning on the go, it would have been a problem training the model constantly. It would have simply demolished LLMs as a business case.

Thankfully, LLMs have the unique ability of in-context learning. They can learn without altering the weights of the model.

So how does it change the scenario. LLMs can learn to answer a query without the actual training. They simply need to be fed the data required and they will come up with the answer.

This concept, that involves answering queries based on the data the LLM hasnโ€™t seen before, is called zero-shot learning. In some cases, the LLM might need to see the data a few times before it can answer. This is few-shot learning.

The LLMโ€™s capability to give answers to more complex queries depends on the size of the context window. It is proportional to the amount of data it can be fed. Smaller context windows worked fine for simpler tasks, but they were simply not capable of dealing with more complex tasks.

How a larger context window changes the game

Claude, version 1.3, can ingest 100,000 tokens, or around 75,000 words in one go. But just stating the fact gives you a little idea how it has transformed the landscape. So better let us put it in context.

This might include any of the Chronicles of Narnia books, as the number of words in these books is less than 75,000. It can include all the dialogues of several Hollywood movies, combined.

What this does is that it gives a chatbox the power to reply any question based on a given text. For instance, you could share with the model a 4-hour long podcast and ask it to summarize the podcast in a few sentences or paragraphs. Or you could ask any question from the text. The chatbox will be able to answer it all. It could specifically point out when a particular statement was made in the podcast.

For anyone who regularly goes through reams of data, such a chatbox is the perfect solution. The likes of research scientists and lawyers will be delighted to learn of it.

Closing thoughts

A context window determines the length of content from a prompt an AI model will process to answer questions. Under the hood, it is the tokens that decide how LLMs segregate words into workable bits. For instance, the word โ€˜superiorโ€™ might be split into โ€˜supโ€™, โ€˜eriโ€™ and โ€˜orโ€™ as tokens. 100,000 tokens capability has helped Claude leapfrog.

Significant increase in the size of content window has accelerated the capabilities of generative AI models like AI exponentially. Claude, for example, can go through a book and write its review in less than 60 seconds. In comparison, a human reader can read a text of 100,000 tokens in close to five hours. They will need additional time for thinking and analyzing the content.

Expansion of content windows can help retrieve information from the extensive documentation, enabling entrepreneurs and managers to run business efficiently. One can even place content from multiple docs at the prompt and ask questions requiring synthesis of all the content.

--

--

Dilip Kumar Patairya
Dilip Kumar Patairya

Written by Dilip Kumar Patairya

Iโ€™m a seasoned tech product Marketing and Sales Professional. I also have a keen interest in writing, particularly on tech trends. Contact d.patairya@gmail.com

No responses yet