How machines understand words

Daniele Nanni
Brass For Brain
Published in
8 min readOct 13, 2023

--

Introduction

Large language models like GPT-4, Claude 2 and Google Bard represent the most groundbreaking development in the realm of Natural Language Processing (or NLP), which is an interdisciplinary field that combines elements of AI, computer science, and linguistics to enable machines to understand, interpret, generate, and respond to human language in a natural and intuitive way.

NLP involves a set of techniques, algorithms, and methodologies that allow computers to analyse, understand, and derive meaning from human language in a useful and scalable manner.

If you are not familiar with NLP you may think at first that it’s packed with jargon and techy-talk, but in reality, at its heart, it’s really about something very familiar to all of us: words.

Just as we use words to share stories, express feelings, and communicate ideas, NLP uses them as building blocks, allowing computers to grasp the essence of our conversations, texts, and commands.

So, while the technical side of NLP might seem complex, it’s all anchored in the simple, everyday magic of words we use in our daily lives.

In today’s exploration, I will attempt to dive into the captivating mechanics behind how machines understand our words at the intersection between language and technology.

A Room Full of Boxes

As part of my course in sociology and applied communication, I studied semiotics, a discipline that involves the study of signs and symbols and how they convey meaning in various forms of communication.

The Semiotic Triangle — Image by the author

As I was studying this fascinating subject and delving into the relationship between symbols, their references and the real life object they signify, I came up with an simple mind trick to help me grasp and explain the concept of semantic field. This term refers to the set of words grouped by meaning that are related to a specific object.

The trick was about imagining words as “labels” that we stick to boxes containing ideas, mental models, properties, abstract concepts and much more. These labels, or words, act as a succinct identification to recall the rich content within each box.

Now, let’s try to bring this metaphor to life.

Imagine a box with the label “apple” on it and place it in a tridimensional room.

Open it up, and inside you might find ideas of juiciness, memories of apple-picking in the autumn, perhaps you may recall the taste of the apple you ate yesterday at dinner or mental images of apple pies, and perhaps even a recollection of the story of Adam and Eve.

You don’t do all this consciously as you are talking about apples, but if you stop and think about apples in various contexts you might start visualising some of the above.

The contents of this box are vast and varied, encompassing everything that the word ‘apple’ can represent.

They can also change throughout history in a way that the items stored in such boxes may change over time, depending on how a certain word is commonly used around the world.

These boxes not only contain personal associations or memories but also encompass collective memories, beliefs, and connotations that societies and cultures weave into words over generations.

For instance, think of words that might have once had neutral or positive connotations but have evolved to carry negative weight due to historical events or cultural shifts.

Conversely, some words that might have been frowned upon in the past have been reclaimed and given new, empowered meanings by communities.

These shifts are not just individual reinterpretations but a reflection of collective memory and shared understanding.

The dynamic nature of language and its intertwining with society’s collective memory means that the contents of these boxes — the vast reservoirs of associations tied to words — are constantly evolving.

As we use language, we’re not just communicating current thoughts but also tapping into historical and cultural nuances embedded within.

From Boxes to Tokens

Now that we understand the idea of words being like boxes filled with concepts, ideas and collective memories, let’s look more closely at the details of language. In the area of NLP, we often split text into smaller parts called tokens.

These tokens can be full words, like “apple”, or parts of words, especially in languages where words can be split into meaningful pieces.

Imagine our three-dimensional room again, but this time, along with the boxes for whole words, you see smaller boxes, each representing different tokens.

For example, the word “unhappiness” might be split into three boxes: “un-”, “happy”, and “-ness”.

Example of a phrase broken down into multiple tokens. — Image by the author.

Each token has its own set of associations, which could be simpler than those of a full word. Some tokens may not make sense on their own, and they are usually coupled with other tokens. This means that a single term may count as 2 or more tokens, when it comes to calculating the size of a word.

In the world of Large Language Models and generative AI, each token, not just whole words, gets a vector representation.

These tokens are the basic input for the AI model. Just like we explained for words, these tokens have their own places in a high-dimensional space, which helps the model understand them.

Vector Embeddings

Going back to the original example with the ‘apple’ box, close to that, we have other boxes with labels such as “orange”, “pear”, “fruit” or “pie”.

When you peek inside them, you’d notice that some of the contents are quite similar to what’s inside the “apple” box.

These shared contents, or associations, drag these boxes close to each other in our metaphysical storage space.

The more common items two or more boxes will have, the closer they will be to each other.

Now let’s move away from the apple, orange, fruit group of boxes. Let’s go to the next side of the room and let’s imagine that there’s a box labelled “aeroplane”.

If you explore its contents, you’d find ideas of flight, perhaps memories of vacations or business trips, and concepts related to engineering and aerodynamics. These contents are vastly different from what you found in the “apple” box, therefore this box is far away.

So, even though both boxes exist in the same vast storage space (which at this point we may call lexicon), they’re quite distanced from each other, due to the differing contexts of their contents.

Yeah, something like that, I guess — Image generated with Midjourney

Transformers and Vector Embeddings

Now that we’ve familiarised ourselves with semantic fields and tokens, let’s delve a bit deeper into the granularity of vector embeddings.

In the world of generative AI, the positions of boxes within our tridimensional room aren’t just arbitrarily decided.

They’re determined by vectors, which in the context of Natural Language Processing means mathematical representations of words.

You may be wondering how is it possible to represent words as numbers?

Let’s use our visualisation super powers again to find out.

Think of these vectors as coordinates in our tridimensional room containing boxes.

The closer the vectors or coordinates, the closer the relationship between the words or concepts.

If we visualise the floor plan of our room by pick two dimensions, x and y representing, let’s say, ‘Age’ and ‘Gender’, we should be able to see something like this when looking at the floor plan from above:

Image Source: Carnegie Mellon’s School of Computer Science

When we feed sentences or texts into a transformer model (e.g. GPT-4), it doesn’t just see words. It sees input vectors.

These vectors map out relationships between words based on vast amounts of data the model has been trained on.

By going through extensive training sessions where they ingest a huge amount of language data in the form of vector embeddings, an AI Language Model progressively learns which words often appear together, which words share similar contexts, and which words might be synonyms or antonyms.

The main difference from our example with apples and airplanes is that while a 3D coordinate system can only represent 3 dimensions, vector embeddings are typically 100–1000 dimensional. This high dimensionality allows word vectors to encode much more semantic information.

For example, in a 100-dimensional embedding space, each dimension can capture some semantic or syntactic property of the word, allowing to capture very rich contexts.

So one dimension could represent gender, another dimension represent tense, another dimension represent plurality, and so on.

With hundreds of dimensions, word vectors can encode rich semantic relationships between words.

Words that are semantically similar will have vectors close together in the embedding space, allowing vector arithmetic like “king — man + woman = queen” to work.

Yes, in simple terms, this practically allows us to execute mathematical operations on words by adding and subtracting ‘concepts’.

Image Source: Carnegie Mellon’s School of Computer Science

However, word embeddings represent semantic meaning but lack to provide an overall context.

This is why transformers models use something called “attention mechanism” to identify the context and relationships between words, given the sequential information that can be extracted from sentences by placing words one after another.

In essence, the combination of input embeddings and attention gives transformer models their ability to understand context and relationships in a sequence without needing to process the meaning of words taken one by one, which makes them both powerful and more efficient than previous technologies.

Closing Thoughts

If you’ve journeyed this far, a huge thank you! Your engagement and curiosity are what make these explorations so rewarding.

I like to think that understanding generative AI in 2023 is like grasping Microsoft Office in 2003, basically a useful skill to add to our ever-evolving digital toolkit.

In the realm of NLP and vector embeddings, it’s essential to recognize that machines perceive phrases as more than just words chained together; they see them as complex structures represented by numbers. These numbers, in the form of vector embeddings, are the mathematical embodiment of concepts, ideas, objects and properties encoded in the language.

This realization shows the profound transformation that is taking place in how we interact with and utilize the potential of language nowadays.

If you enjoyed this article and would like to explore more about how generative AI will shape our lives, I recommend reading my articles about the enterprise of tomorrow and information retrieval in a post-search engine era.

Thank you for reading!

~ Daniele Nanni

--

--