UMAP Technique Overview

Christian Burke
2 min readMar 19, 2024

--

A quick case study on my experience with dimension reduction studies.

Browsing through my photo gallery, I found dimensionality reduction studies from 2021 while working at Refik Anadol Studio. The method we used here is UMAP, a powerful tool for dimensionality reduction.

Christian Burke showing dimension reduction studies

In this instance, we used it to visualize the complex outputs of an image classification model, condensing them into more manageable representations.

I’ve found this approach particularly insightful for delving deeper into datasets, unraveling their clusters, and sometimes uncovering interrelations between them.

These UMAP visualizations depict images from the Air and Space Museum. Let’s take a closer look at how it works.

The UMAP process explained

The process begins with:

  • feeding images through an image classification model,
  • the model transforms images from standard PNGs into what I refer to as their “computer representation” or “embeddings”— extensive vectors of floating point numbers.

Effectively, we’ve converted a collection of PNGs into a sizable matrix, primed for various machine-learning applications, including UMAP.

These visual renditions serve as one facet of this analytical journey.

Christian Burke showing dimension reduction studies

On the left, there’s a series of points and lines, where each point signifies an image. Each line denotes a connection between two points, indicating their relationship.

On the right, the same data is presented differently: the points are concealed, and the lines are amalgamated through a process called edge-bundling.

How do we derive meaning from non-empirical outcomes?

An observation from the graphs is that the data appears to be clustered distinctly despite being unsupervised learning (that just means we don’t try to tell it what’s what during training).

UMAP maintains original data structures by clustering like objects. Clustering occurs as airplane image types are similar or different depending on the plane model.

Imagine them as a family tree, could the relationships form an aviation innovation graph or style developments?’

While it’s generally not advised to use UMAP for empirical results (i.e. computing the distance between two points and making assumptions based on those results), it is still an invaluable technique.

Liked this article? Follow me on Twitter, LinkedIn, and Instagram

See me speak about AI at past PyTorch Conferences 2023 Keynote, 2022 Keynote, Community Voices

--

--

Christian Burke

Christian Burke, Head of Engineering at Refik Anadol Studio, merges tech, art, & philanthropy to lead projects worldwide.