Processing natural language with neural networks is fiendishly hard! Here’s why…

Mark Farragher
Mar 14, 2019 · 4 min read

Natural language processing (NLP) is the science of getting a computer to understand written text.

You might be wondering why we don’t have software yet that can read books and comment on then?

It’s because understanding written text is fiendishly difficult!

Here, I’ll show you.

The go-to solution to teaching computers how to read is by using a Recurrent Neural Network (RNN):

This is a neural network that has a connections that loop back to the same node.

This setup allows the network to remember the data from the previous iteration and use it to adjust its predictions going forward.

To read a text, a computer would first tokenize each word to a number, and then feed the numbers of each word in a sentence into the network.

This works, but only to a point.

Consider these two sentences:

“I went fishing at the river bank
“The
bank clerk looked at me suspiciously”

In the first sentence, ‘bank’ refers to a side of the river. In the second sentence, it refers to a financial institution.

An RNN would understand the first sentence, because it remembers ‘river’ and can use it to refine the meaning of ‘bank’.

But it will struggle with the second sentence, where the meaning of the word ‘bank’ is determined by the next word: ‘clerk’.

By the time the RNN reaches the word ‘clerk’, it has already processed ‘bank’, and it’s too late to redefine that word.

NLP researchers have tried to fix this sequencing problem by training neural networks on groups of words. A network that trains on words in relation to both the previous and next word in a sentence is called a bi-directional contextual network.

But even that doesn’t solve the problem. Consider these sentences:

“I spent ages crossing the river before I finally arrived at the bank
“I spent ages crossing the
road before I finally arrived at the bank

Now the meaning of the word ‘bank’ is determined by another word 7 places earlier in the sentence!

And this touches on a big disadvantage of RNNs: they tend to forget information over time. By the time we reach the word ‘bank’, the neural network will have forgotten all about the river/road.

So RNNs are a bit limited. We need something more powerful.

Last year, Google published a completely new neural network architecture for reading text called the Transformer.

Instead of looking at a sentence word for word like an RNN does, the Transformer reads an entire sentence in one go. It remembers how all words fit together, and uses that information to refine its understanding of the sentence as a whole.

Let’s say we have the following sentence:

“The animal didn’t cross the street because it was too tired”

Who was tired here? The animal or the street?

The Transformer can compare the word ‘it’ to all other words in the sentence and estimate correlations:

Here the network has discovered that ‘it’ most likely refers to ‘The’ and ‘animal’.

Neural networks are just stacks of specialized network layers, and the Transformer is no different.

A Transformer is built up of a stack of special decoder layers.

Here’s what a single decoder looks like. It’s made up of 4 sub-layers:

This simplified decoder is reading the two-word sentence “Thinking Machines”. The first self-attention layer in the stack is where the magic happens. This layer compares each word in a sentence to all other words.

The most powerful NLP network in the world right now is called BERT. It was published only a month ago, and it outperforms all other language processing software to date. It uses a stack of 24 decoders with some tweaks.

You can read more about BERT here:

https://github.com/google-research/bert

Did I inspire you to start building NLP apps?

Add a comment, and let me know!

The Machine Learning Advantage

Learn the basics of computer vision and machine learning…

Mark Farragher

Written by

I help C# developers learn Computer Vision and Machine Learning in 6 weeks without requiring complex math or Python | https://www.machinelearningadvantage.com

The Machine Learning Advantage

Learn the basics of computer vision and machine learning without any complex math! Lots of C# examples and a tiny bit of Python.

Mark Farragher

Written by

I help C# developers learn Computer Vision and Machine Learning in 6 weeks without requiring complex math or Python | https://www.machinelearningadvantage.com

The Machine Learning Advantage

Learn the basics of computer vision and machine learning without any complex math! Lots of C# examples and a tiny bit of Python.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store