Anya TrivediScaling with SparsityHow Switch Transformers Scale to a Trillion Parameters using Sparsely-Activated Expert Models13 min read·Nov 28, 2021----