Member-only story
Contour Plots and Word Embedding Visualisation in Python
Contour plots are simple and very useful graphics for word embedding visualization. This end-to-end tutorial uses IMDb data to illustrate coding in Python.
Introduction
Text data vectorization is a necessary step in modern Natural Language Processing (NLP). The underlying concept of word embeddings has been popularised by two Word2Vec models developed by Mikolov et al. (2013a) and Mikolov et al. (2013b). Vectorizing text data in high-dimensional space, we can tune ML models for tasks like machine translation and sentiment classification.
Word embeddings are typically plotted as 3-D graphs, which brings some difficulties in presenting them easily in papers and presentations. Here, the contour plot is a simplification of how to tackle 3-dimensional data visualization, not only in NLP but originally in many other disciplines. Contour plots can be used to present word embeddings (i.e., vector representations of words) in a 2-D dimensional graph. This article provides a step-by-step tutorial for the visualization of word embedding in 2-D space and develops the use cases where contour plots can be used.