Member-only story

Mathematical Statistics and Machine Learning for Life Sciences

tSNE vs. UMAP: Global Structure

Why preservation of global structure is important

Nikolay Oskolkov
Towards Data Science
12 min readMar 4, 2020

--

Image source

This is the fifteenth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Dimension reduction techniques such as tSNE and UMAP are absolutely central for many types of data analysis, yet there is surprisingly little understanding of how exactly they work. Previously I started comparing tSNE vs. UMAP in my articles How Exactly UMAP Works, How to Program UMAP from Scratch, and Why UMAP is Superior over tSNE. Today I will share my views on to what extent tSNE and UMAP are capable of preserving global structure in your data and why it is important. I will attempt to show mathematical reasons for better global structure preservation by UMAP using real-world scRNAseq data as well as synthetic data with known ground truth. I will specifically address the limit of large perplexity / n_neighbors where both algorithms can presumably retain global structure information.

Clustering on UMAP Components

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Nikolay Oskolkov
Nikolay Oskolkov

Written by Nikolay Oskolkov

Bioinformatician, Lund University and NBIS SciLifeLab, Sweden

Responses (9)