Member-only story
Transformers: The bigger, the better?
Google surprises us again with a new record in the number of parameters of a Transformer
Large language models using Transformers currently make up one of the most active fields in the area of Artificial Intelligence, only affordable for a few technology companies due to the computing resources required. A few days ago, Google surprised us with a new record for the number of parameters in a language model.
The largest model so far
Since 2018 with BERT, several other large language models (these models are all variants of the Transformer architecture) has been developed, which has continued to push state of the art forward. The improvements in these models have primarily come from scaling the model’s size in terms of the number of parameters (figure 1). This latest model from Google, called Pathways Language Model (PaLM), outperforms all existing ones so far. Specifically, this model comprises 540 billion parameters, which are 10 billion more parameters than the largest model to date, the so-called Microsoft/NVIDIA Megatron-Turing NLG. Both with more than three times as many parameters as the famous GPT-3, which “only” had 175 billion parameters.