Member-only story

Transformers: The bigger, the better?

Google surprises us again with a new record in the number of parameters of a Transformer

Jordi TORRES.AI
TDS Archive
4 min readApr 20, 2022

--

(image by author)

(Spanish version)

Large language models using Transformers currently make up one of the most active fields in the area of ​​Artificial Intelligence, only affordable for a few technology companies due to the computing resources required. A few days ago, Google surprised us with a new record for the number of parameters in a language model.

The largest model so far

Since 2018 with BERT, several other large language models (these models are all variants of the Transformer architecture) has been developed, which has continued to push state of the art forward. The improvements in these models have primarily come from scaling the model’s size in terms of the number of parameters (figure 1). This latest model from Google, called Pathways Language Model (PaLM), outperforms all existing ones so far. Specifically, this model comprises 540 billion parameters, which are 10 billion more parameters than the largest model to date, the so-called Microsoft/NVIDIA Megatron-Turing NLG. Both with more than three times as many parameters as the famous GPT-3, which “only” had 175 billion parameters.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Jordi TORRES.AI
Jordi TORRES.AI

Written by Jordi TORRES.AI

Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI

No responses yet