Attention Networks: A simple way to understand Cross-Attention

4 min readJul 18, 2022

In recent years, the transformer model has become one of the main highlights of advances in deep learning and deep neural networks. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results. OpenAI has used transformers to create its famous GPT-2 and GPT-3 models.

Since its debut in 2017, the transformer architecture has evolved and branched out into many different variants, expanding beyond language tasks into other areas. They have been used for time series forecasting. They are the key innovation behind AlphaFold, DeepMind’s protein structure prediction model. Codex, OpenAI’s source code–generation model, is based on transformers. More recently, transformers have found their way into computer vision, where they are slowly replacing convolutional neural networks (CNN) in many complicated tasks.

Researchers are still exploring ways to improve transformers and use them in new applications. Here is a brief explainer about what makes transformers exciting and how they work.

In this post, we will look at the Cross Mechanism. To understand them better you need to have a good understanding of what Attention Networks or Mechanisms are. I won’t cover the introduction for them here for that you can check out my previous…

Attention Networks: A simple way to understand Cross-Attention

Written by Geetansh Kalra