Member-only story
Energy-Based Transformers are Scalable Learners and Thinkers
It has been so long since the Transformers were invented, and still, there is no new major architecture that has been able to beat it fair and square. But what if we can incorporate some interesting idea and combine it with the power of Transformers? That’s where the Energy-based Transformers come in. Most AI models are quite analogous to System I thinking and are incapable of deep, long, nuanced thinking. But what if we can change that and make the system think deliberately about certain problems? This is the hypothesis we are going to discuss.
Table of Contents
- Why Transformers Are So Strong?
- Thinking Problem For AI
- Debate About System II
- Reasoning Models Problems
- Key Properties of Energy-based Models
- Energy-Based Transformers (EBT) Intuition
- Conclusion
Why Transformers Are So Strong?
Transformers have dominated the AI landscape since their introduction in 2017’s “Attention Is All You Need” paper. There are many reasons as to why they work so great, but if I have to explain it in simple terms. It is their ability to route information in a weighted manner.
Dynamic, Content-Based Routing. Unlike fixed architecture, where information flows through predetermined paths, transformers use attention to dynamically route…

