Are Tiny Transformers The Future Of Scaling?

Vishal Rajput
AIGuys
Published in
13 min readNov 12, 2024

--

Ever since the release of LLMs, we have been trying to reduce the memory of our models. Over the years, we have come across many innovations like different types of Quantization, Dropout, etc. We even tried to completely change the model architectures to solve the scaling problems of Transformers. Research like Flash Attention, RetNet, State Space Models, and many others show great potential, but somehow Transformer remains the king. Today we are looking at some brand-new research papers and see what’s happening in this space. Have we made some real improvements or not?

So, fasten your seat belt as this blog is going to be a bit advanced and is meant for people with sufficient background in AI.

Topics Covered

  • Understanding Attention Memory Requirements
  • Different Memory Reduction Techniques (4 Main Categories: Quantization, Knowledge Distillation. Approximation of Attention mechanism, New methods that are replacements for Transformers)
  • WHAT MATTERS IN TRANSFORMERS? NOT ALL ATTENTION IS NEEDED
  • PYRAMIDDROP: ACCELERATING YOUR LARGE VISION-LANGUAGE MODELS VIA PYRAMID VISUAL REDUNDANCY REDUCTION
  • Conclusion
Photo by Samsung Memory on Unsplash

Understanding Attention Memory Requirements

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Responses (1)