RetNet: Transformer killer is here

Vishal Rajput
AIGuys
Published in
8 min readSep 14, 2023

--

I don’t think there has been a bigger paper than “Attention is all you need.” In the last few years. Attention-inspired Transformers have become the backbone of every major AI architecture, given its capability to process everything from sound to images, from text to video. Transformer has been the king of architecture for the last few years and became even more popular after the release of LLMs. But this architecture has a slight problem: it is quite a memory and resource-intensive design. In today’s blog, we will look into a new architecture developed by Microsoft to beat Transformers. This paper could be seen as the successor to Transformers and might hold great promises as we move towards the future.

Here’s what we are going to talk about:

  • What does RetNet achieve?
  • Background (Problem with attention mechanism and other research avenues)
  • Understanding recurrent and parallel
  • RetNet architecture
Photo by Jr Korpa on Unsplash

What does RetNet achieve?

Let’s begin with what is the claim of the paper directly.

It is foundational architecture for LLMs, simultaneously achieving training parallelism, low-cost inference, and good performance.

--

--