Running Fast Transformers on CPUs: Intel Approach Achieves Significant Speed Ups and SOTA Performance
Large transformer language models (LM) that scale up to billions of parameters have demonstrated state-of-the-art performance across a wide variety of natural language processing (NLP) tasks. The real-world deployment of such models however remains limited due to their slow speed and heavy compute demands.