Dimensionality Reduction with t-SNE

Dmitry Yemelyanov
Riga Data Science Club
2 min readAug 28, 2020

Introduction

How do you imagine four dimensions? My best attempt to it is drawing a hypercube, but what about five, six and more? We have to admit that our mind is limited to the 3D world, however in the data science, you are constantly facing data of all sort of the dimensions.

Tesseract, a four-dimensional hypercube

Dimensionality reduction comes to the rescue! This article is a brief overview of t-SNE, popular algorithm to reduce dimensions of your data.

Word embeddings

In this post I will take a look at dimensionality reduction though the prism of word embeddings. Long story short, these are words encoded into multidimensional vectors. Common sizes of embeddings are ranging from 50 to 300. Sometimes it is important to understand what are relationships between particular words vectors, which is nearly impossible in their original multidimensional form. Dimensionality reduction comes to the resque!

If you are new to word embeddings, please take a look at my two-minute explanation of this term by example:

Riga is the capital of Latvia

Riga, Latvia

For the demonstration purpose we will use four GloVe word vectors: “Riga”, “Latvia”, “Capital” and “Country”. Looking at original 300D word vectors gives no clue to the relationships between words, however following short code snippet enables t-SNE reduce word vectors to the size of 2:

Downsizing 300D GloVe word vectors to 2D

This trick allows us to make a scatter plot of 2D vectors and build a deeper understanding of the relationship between words: “Riga” to “Latvia” is as “Capital” to “Country”. In other words “Riga is the capital of Latvia”

2D scatter plot of “t-SNEed” word vectors

Dimensionality reduction is a handy technique apart from the word embedding visualization you will find useful, for example, when dealing with multidimensional feature input.

Code

https://www.kaggle.com/dmitryyemelyanov/word-vector-dimensionality-reduction-with-t-sne

--

--

Dmitry Yemelyanov
Riga Data Science Club

Founder at Riga Data Science Club | Machine Learning Consultant