Dimensionality Reduction with t-SNE

Published in

Riga Data Science Club

2 min readAug 28, 2020

Introduction

How do you imagine four dimensions? My best attempt to it is drawing a hypercube, but what about five, six and more? We have to admit that our mind is limited to the 3D world, however in the data science, you are constantly facing data of all sort of the dimensions.

Dimensionality reduction comes to the rescue! This article is a brief overview of t-SNE, popular algorithm to reduce dimensions of your data.

Word embeddings

In this post I will take a look at dimensionality reduction though the prism of word embeddings. Long story short, these are words encoded into multidimensional vectors. Common sizes of embeddings are ranging from 50 to 300. Sometimes it is important to understand what are relationships between particular words vectors, which is nearly impossible in their original multidimensional form. Dimensionality reduction comes to the resque!

If you are new to word embeddings, please take a look at my two-minute explanation of this term by example:

Word Embeddings by example

A gentle introduction to the word embeddings, key concept of Natural Language Processing.

medium.com

Riga is the capital of Latvia

For the demonstration purpose we will use four GloVe word vectors: “Riga”, “Latvia”, “Capital” and “Country”. Looking at original 300D word vectors gives no clue to the relationships between words, however following short code snippet enables t-SNE reduce word vectors to the size of 2:

Downsizing 300D GloVe word vectors to 2D

This trick allows us to make a scatter plot of 2D vectors and build a deeper understanding of the relationship between words: “Riga” to “Latvia” is as “Capital” to “Country”. In other words “Riga is the capital of Latvia”

2D scatter plot of “t-SNEed” word vectors

Dimensionality reduction is a handy technique apart from the word embedding visualization you will find useful, for example, when dealing with multidimensional feature input.

Code

https://www.kaggle.com/dmitryyemelyanov/word-vector-dimensionality-reduction-with-t-sne