#30DaysOfNLP

NLP-Day 27: How To Visualize Word Embeddings With Tensorboard

A picture is worth more than a 1000 word embedding

Marvin Lanhenke
4 min readMay 3, 2022

--

Visualizing word embeddings #30DaysOfNLP [Image by Author]

Yesterday and in the days before, we implemented several different neural networks. We covered convolutional and recurrent networks before stepping into the world of Transformers.

But what if we want to gain more insights into the model’s performance? What if we want to visualize word embeddings?

In the following sections, we’re going to learn how to do just that. We will make use of Tensorboard, plot, visualize, and verify word embeddings.

So get ready and prepared, make sure to follow #30DaysOfNLP: How To Visualize Word Embeddings With Tensorboard

Gaining Insights

To me, it’s still magical.

The inner workings of a neural network can be fascinating. They can be mysterious. And they can be frustrating. Especially when nothing is working as expected and we have no idea what went wrong inside this magical yet unforgiving Blackbox of a neural network.

Fortunately for us, we have Tensorboard.

Tensorboard allows us to gain more insights into the model’s performance while training. Tracking model metrics, plotting the network weight distribution, or visualizing word embeddings.

And on top of that, it’s relatively straightforward to use. It simply connects to a training instance via browser.

In the next section, we’re going to work through an example. Get Tensorboard up and running, and visualize word embeddings.

So let’s pip install tensorboard and we’re ready to proceed.

Visualizing word embeddings

Plotting and visualizing word embeddings can be especially useful when training our own, domain-specific embeddings. A good visualization enables us to verify the embeddings and check for semantic similarities.

Luckily with Tensorboard, we can convert a word model to a format that can be visualized in a straightforward manner.

We simply have to load the word vectors and the vocabulary. Tensorboard will handle the rest for us by making use of 4 different methods to reduce the dimensions into either 2 or 3-D. Those are PCA, t-SNE, UMAP, and a user-defined procedure.

Prepare the word embeddings

First of all, we need data to visualize. We need word vectors.

For this example, we can simply rely on the pre-trained Google News Documents Word2Vec model that can be downloaded here. Next, we import the vector by making use of the gensim library.

Now, we prepare the data.

We extract the labels and for each label, we get the associated word vector. In order to create our final projection data, we iterate over each label and vector and store both as a tuple inside a list.

And this is it. We have data to project and visualize.

Create a projection

Now that we have our data, we need to convert it to a format that Tensorboard can turn into a visualization. Thus, we define a function.

After importing the necessary libraries, we specify the name of the metafile and get the number of samples and the vector dimensions.

Next, we create the metafile and the projection matrix which simply stores the word vector for each sample. For that purpose, we iterate over the projection data. Extract the label and the vector, store the vector in the projection matrix, and save the label in a line-by-line fashion in the metafile.

Now, we create a tensor, based on the word embeddings and save it as a checkpoint file.

Within the last few lines of the function, we set up the projector, add the embedding, and provide the tensor name as well as the metafile’s path.

We can then call that function and create our projection.

Open Tensorboard

Simply execute the following statement in your command prompt and Tensorboard should be up and running.

tensorboard --logdir="/tmp/"

Note: Make sure you provide the correct path to your log directory.

Now, we can open our favorite browser, type in localhost:6006 and we should see Tensorboard.

By clicking on the dropdown menu in the upper-right corner, we can select the Projector and inspect our data.

Selecting the projector [Screenshot by Author]

Let’s search for a specific keyword.

The search bar is also located in the upper-right corner. By looking for the word intelligence we get the following output.

Word2Vec visualization with PCA and Tensorboard [Screenhot by Author]

This looks pretty amazing.

We can play around with different search queries and inspect the semantic similarities based on the distance — with just a few mouse clicks.

Conclusion

In this article, we gently introduced Tensorboard by plotting and visualizing word embeddings.

Tensorboard makes it relatively easy to gain insights into the inner workings of a model or to verify word embeddings. However, we merely scratched the surface and there is more Tensorboard functionality to discover.

So take a seat, don’t go anywhere, make sure to follow, and never miss a single day of the ongoing series #30DaysOfNLP.

30 stories

--

--

Marvin Lanhenke

Business Analyst. Solutions Architect. Self-Taught. Hands-On. Writing about Software Architecture & Engineering. Say Hi @ linkedin.com/in/marvinlanhenke/