What is Google Brains new XLNet Language model?

Published in

Voice Tech Podcast

4 min readJun 27, 2019

Google just released a record-breaking language model called XLNet. But what is a language model, and why is Google investing so much in Natural Language Processing research?

Then will computers learn to understand us?

It’s difficult to get computers to do what we want them to. They have no common sense and therefore take everything literally. Today’s post is about how researchers are using neural networks to get computers to understand the world better. Neural networks are computer programs that learn things from data. I describe neural networks in more detail in my blog post, “What is a neural network.”

To train neural networks to understand human language, researcher train neural networks that learn to guess the next word in a sentence, sort of like iPhones and Huey, Dewey and Louie do it:

Donald Duck: Counter Spy (Cheerios premium giveaway, 1947) — © Disney

Such neural networks are called language models, and they are crucial for getting computers to understand humans.

Why are language models so important?

Language models are important for 3 main reasons:

1. It requires much knowledge of the world to guess the right word.

It might seem simple to guess the next word in a sentence to you, but it is actually very difficult for computers. If I say: “I am not following my vegan diet, sometimes I eat a little …” and you want to guess that next word is turkey, you must know what it means not to be a vegan.

2. There is almost unlimited training data available.

Language models can be trained with any text. We can create training examples by merely hiding and remembering the last word in any sentence. Today’s language models learn from texts that are thousands of books long.

3. Language models can pass their knowledge on to other neural networks.

Once a language model has learned to guess the next word in a sentence, it has acquired lots of knowledge about the world. By connecting the language model to another neural network, they can share knowledge, so the neural network also gains knowledge about the world.

This all might sound complicated, but it’s actually just like two people sharing information.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

So what are researchers working on today?

The idea of passing information from one neural network to another has been around for a long time and has been used to understand images for many years.

In January 2018, a language model called “ULMfit” showed that this technique also worked very well for text. Soon after, another language model called “GPT” improved this idea by using a more advanced kind of neural network called a “Transformer neural network.”

Then, a language model called “BERT” became even better by not just guessing the next word in a sentence, but also words in the middle of sentences, such as guessing that X is “spaghetti” in the sentence “I eat X with meatballs.”

Recently, the same team that made “GPT” has a second version “GPT 2” that is even better, because it has been trained for a longer time on a lot more data, and because they used higher quality data.

And now, Google as set a new record with the new XLNet.

For every new model, it has read more and more text. The “ULMfit” model had read about 1000 books, and the new GPT 2 model has read about 10.000 books. It requires loads of power to learn from so many books. It’s estimated that it would cost around $50000 to rent enough computers to train the GPT 2 model.

So what is XLNet?

XLNet is a general language model just like ULMfit, BERT, and GPT. XLNet beats the performance by using several neat tricks:

BERT used a neural architecture called the “Transformer.” However, Google has subsequently released an update to Transformers called Transformer-XL. Transformer-XL architecture is better at handling long, complicated sentences.
Rather than just guessing X in “I eat X with meatballs,” XLNet also guess X in shuffled sentences such as “I X with meatballs eat” and “X meatballs eat I with.”
XLNet is “autoregressive,” where BERT is an “autoencoder.” “Autoregressive models are better at generating new text, where autoencoders are better at reconstructing text it has already learned from.

What will happen in the future?

Language models are likely to keep becoming better and better. The internet is large enough that there is still room for making computers read even more text. Many researchers are working on this problem, so we will almost certainly see improvements to the models used as well.

Even once we get to a point where language models have read the entire internet, they will still lack much of the common sense that we humans take for granted. Humans rely on a lot of context and insight knowledge when communicating.

One particularly interesting approach to this problem is using “Knowledge bases”. Knowledge bases are big collections of facts about the world, such as “A dog is a mammal” and “Mammals are animals.” The Chinese search engine Baidu has released their own language model that outperforms BERT by incorporating such knowledge into the model. Beating BERT, they snarkily decided to name their model “Enhanced language RepresentatioN with Informative Entities,” or short: ERNIE.

Rivalry among research teams might seem petty, but it is also pushing the research community to new levels of performance. Natural Language Processing is moving forward at an impressively fast pace. Make sure to check out our blog frequently to stay up to date with the latest developments.