Member-only story
Mathematical Statistics and Machine Learning for Life Sciences
tSNE vs. UMAP: Global Structure
Why preservation of global structure is important
This is the fifteenth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Dimension reduction techniques such as tSNE and UMAP are absolutely central for many types of data analysis, yet there is surprisingly little understanding of how exactly they work. Previously I started comparing tSNE vs. UMAP in my articles How Exactly UMAP Works, How to Program UMAP from Scratch, and Why UMAP is Superior over tSNE. Today I will share my views on to what extent tSNE and UMAP are capable of preserving global structure in your data and why it is important. I will attempt to show mathematical reasons for better global structure preservation by UMAP using real-world scRNAseq data as well as synthetic data with known ground truth. I will specifically address the limit of large perplexity / n_neighbors where both algorithms can presumably retain global structure information.