How the chatbot understands words

Sam Smith
4 min readMay 16, 2017

--

We are so used to words that we take them for granted. Isn’t it obvious that “cat” is a kind of animal, “bleeding” is bad news, and “diarroea” is just a misspelling of diarrhoea?

Computers don’t know any of this. To a computer words are just sequences of characters, and different words appear entirely unrelated. Or at least, that was the case until the discovery of “word embeddings”. These embeddings are based on a central idea:

In order to understand words, we need to know which words are similar.

To understand this, consider the following two sentences:

“I ate a delicious meal last night”

“I ate a tasty meal last night”

Intuitively, you recognise that “delicious” and “tasty” are similar words. In fact, they are almost interchangeable. This enables you to generalise from previous experience. Once you know what the first sentence means, the fact that delicious and tasty are so similar enables you to guess the meaning of the second sentence. We could take this even further:

“I had a wonderful dinner yesterday”

It doesn’t mean exactly the same thing. But it’s not far off, and although most of the words have changed, the sentences can still be broken down into pairs of similar words/phrases:

  1. “ate” → “had”
  2. “tasty” → “wonderful”
  3. “meal” → “dinner”
  4. “last night” → “yesterday”

If we can teach a machine to understand word similarity, then that machine will be able to infer that the three sentences above all mean essentially similar things.

So what is a word embedding? It is just a map!

Yes really, just a regular boring map, like the ones you used before satnav. But whereas normal maps have cities, towns and villages; this map contains words and simple phrases.

You can see a sketch of such a map above. Every word in the vocabulary is assigned its own location. Intuitively, “delicious”, “tasty” and “yummy” all lie close to each other. “Car” and “van” are also close to each other, but far away from “delicious”. In general, the more similar two words are, the closer they are placed on the map.

Thus, just as Birmingham is about 20 miles from Coventry but over 200 miles from Edinburgh, we can define precisely how similar two words are by calculating the “distance” between their word embeddings.

The location of a city on a map is defined by its latitude and longitude. Similarly, we can define the location of the word “car” on our simple map above by two numbers

Car = (0.7, 0.3)

In mathematics, this list of numbers is called a “vector”, and consequently word embeddings are often called “word vectors”. However, nothing clever has happened, we just draw horizontal and vertical lines on the map above, and read the numbers off the chart.

Now, the screen you are currently reading is two-dimensional, and so our sketch above had just two dimensions. As a result, our “car vector” only contained two numbers; 0.7 and 0.3.

Unsurprisingly, if we define each word by just two numbers, it is difficult to describe all of the complicated relationships between English words. There are just not enough ways of putting words on a two dimensional map to describe word meaning properly. To get around this, we make the map bigger; much bigger! The most-commonly used English word vectors were open-sourced by Google, and they describe each word as a single point in a three hundred dimensional map. This sounds fancy, but it just means that the vector that describes car contains not two, but three hundred numbers.

Car = (0.7, 0.3, 0.1, …., 0.8)

For biological reasons, we find it hard to think in more than three dimensions. However mathematically nothing spooky happens, so if they have enough RAM, computers will handle as many dimensions as you can throw at them. It turns out that in three hundred dimensions, we can accurately describe the similarity relationships between tens of thousands of common English words.

It sounds far-fetched, but every word that you type to the babylon chatBot is converted into a word vector, a sequence of three hundred numbers which describes a single point in our “semantic map”. When the chatBot sees this word vector, it immediately understands which English words are similar to the word you used, and which English words are not.

In the final section, we will show how the chatbot uses this knowledge to decide how best to respond to simple medical queries.

--

--