Position Interpolation: Extending Context Window Sizes in Large Language Models

shashank Jain
3 min readAug 8, 2023

In this blog post, we will delve into the paper Position Interpolation for Large Language Models, which proposes a novel method to extend the context window sizes of large language models (LLMs) such as LLaMA.

Introduction

The paper introduces a technique called Position Interpolation (PI) to enable LLMs to handle longer context windows without the need for training from scratch. The researchers found that directly fine-tuning an existing LLM with a longer context window is inefficient and slow. Instead, they propose down-scaling the position indices to match the original context window size, using interpolation rather than extrapolation. This approach allows for the accommodation of more input tokens without causing catastrophic attention scores.

Positional Encodings in Transformers

Positional encodings are a crucial component of Transformer models. They provide a sense of order to the input tokens, allowing the model to understand the position of each token in the sequence. Without positional encodings, all input tokens would be processed independently, and the model would lose the ability to understand the sequence’s order.

There are several types of positional encodings used in Transformer models:

  1. Fixed Positional Encodings: These are pre-computed vectors that are added to the input embeddings. The original…

--

--