What is FIM and why does it matter in LLM-based AI

4 min readOct 21, 2023

When you’re writing in your favorite editor, the AI-like copilot will instantly guess and complete based on what you’ve written in the editor buffer.

This is usual for daily AI-assistant users. The huge language models that have been developed are trained on massive amounts of data to do one thing exceptionally well: predict the next character based on the previous N characters. The ability to make these predictions with high accuracy is the powerful force behind the current boom in AI. It enables the creation of chatbots like Bing and ChatGPT, writing assistants like Jasper, and code autocomplete tools like Copilot.

However, the computation behind this kind of text inference requires the AI to accept previous N characters first. But sometimes in our practical writing or coding, we may have a “hole” between contexts.

We hope the AI assistant can fill the “hole” for us with proper content.

This kind of technique is called FIM, Fill-in-the-Middle.

How Fill-in-the-Middle works

FIM was first introduced in the paper <<Efficient Training of Language Models to Fill in the Middle>> from OpenAI.

The problem they realized is that all the existing model classes are limited when it comes to infilling, where the model is tasked with generating text at a specific location within a prompt, while conditioning on both a prefix and a suffix. Left-to-right models can only condition on the prefix. The infilling step will only generate a shorter result which is not good for practical using. However, the infilling operation has wide usage, for example, in creating a coding assistant, infilling can be used for docstring generation, import statement generation, or for completing a partially written function. So it is worth optimizing infilling.

The basic idea is simple: cutting each document into three parts: prefix, middle, and suffix. Perform this split before tokenization when the document is still a sequence of characters. Split uniformly at random, which means the lengths of prefix, middle, and suffix are each 1/3 of the full document in expectation.

How to train it?

For example, we have a sentence:

Now given the prefix The quick brown fox, and the suffix a lazy dog, we expect to complete the middle content jumps over. Here’s the encoding according to FIM, EOM means End of Middle:

However, since we shouldn’t know the answer <MID>jumps over<EOM> yet, for the convenience of training, we have to put the middle part to the end:

This is the ideal element format of the training dataset. It could be explained as, we already know the prefix and suffix part, and the related middle part, say, the answer, should be the content within <MID>…<EOM>.

How to inference?

At inference time, if we’re trying to infill a document like the following:

we can present it as

to the model and request characters until the model emits an <EOM> token, at which point it has successfully joined the prefix with the suffix.

Conclusion

According to the evaluation of the paper, and more applications in the industry. The accuracy of FIM model is better than non-FIM model.

What should be notable is that FIM can be learned for free. For the 50% FIM rate, the loss can be ignored. FIM has almost no cost the combine with the existing models.

What is FIM and why does it matter in LLM-based AI

How Fill-in-the-Middle works

How to train it?

How to inference?

Conclusion

Written by Eva Thompson