Are Tiny Transformers The Future Of Scaling?
Ever since the release of LLMs, we have been trying to reduce the memory of our models. Over the years, we have come across many innovations like different types of Quantization, Dropout, etc. We even tried to completely change the model architectures to solve the scaling problems of Transformers. Research like Flash Attention, RetNet, State Space Models, and many others show great potential, but somehow Transformer remains the king. Today we are looking at some brand-new research papers and see what’s happening in this space. Have we made some real improvements or not?
So, fasten your seat belt as this blog is going to be a bit advanced and is meant for people with sufficient background in AI.
Topics Covered
- Understanding Attention Memory Requirements
- Different Memory Reduction Techniques (4 Main Categories: Quantization, Knowledge Distillation. Approximation of Attention mechanism, New methods that are replacements for Transformers)
- WHAT MATTERS IN TRANSFORMERS? NOT ALL ATTENTION IS NEEDED
- PYRAMIDDROP: ACCELERATING YOUR LARGE VISION-LANGUAGE MODELS VIA PYRAMID VISUAL REDUNDANCY REDUCTION
- Conclusion