# Topological Data Analysis w/ Scikit-tda

Continuing from where we left off earlier…

# Lets go through some Theory (of Topology)

# Persistent homology

**Persistent homology** is a method for computing topological features of a space at different spatial resolutions.

**How do we do that?**

We decide on a radius e.g. r = 0.5

And for each point in our dataset, we connect that point with lines to ALL other points which are within this radius from it.

We gradually and arbitrarily increase this radius. So that more and more points get connected to each point. To the extreme situation when the radius is so big that all points get connected to all points and our entire space of data points are covered and connected to each other.

**Why do we do this in the first place?**

Simply because this is the easiest way to connect arbitrary data points in arbitrary dimensional space. And once we do it, in the next step we can proceed with analysing and studying its topology i.e. shape features. Which is the entire purpose of our analysis i.e. to study the shape of our data (Aka Topology of our data).

# What are Barcodes?

If we increase the radius and plot the radius along the horizontal x-axis. And for each value of the radius, you connect the points in your dataset to create a simplicial complex, you will see various features (e.g. holes) appearing and disappearing in your simplicial complex. We plot these features as bar’s (aka barcodes) along the horizontal axis. The short bars correspond to noise and the long bars correspond to features.

It is very difficult to explain on paper. Please see the video below to understand this visually. (trust me you will understand it very easily on seeing the video in just a few minutes, even though you will struggle to explain it to someone else yourself)

The barcode is computable using Linear Algebra in O(N³) complexity. [Remember this. This is the only reason why Topology and Topological Analysis is computationally tractable as it doesn’t take exponential compute)

# Simplicial complex

In mathematics, a **simplicial complex** is a set composed of points, line segments, triangles, and their n-dimensional counterparts

The basic idea — Connect nearby points and create a Simplicial Complex.

# Algebraic topology

Official Definition — Algebraic topologyis the study of intrinsic qualitative aspects of spatial objects (e.g., surfaces, spheres, tori, circles, knots, links, configuration spaces, etc.) that remain invariant under both-directions continuous one-to-one (homeomorphic) transformations.

In layman’s english, we would like to create algebraic representations of shape properties (e.g. holes) which do not change if we stretch or twist the shapes. Without cutting (with a scissor) or pasting (with a glue) the shape in any way.

Algebraic topology offers a mature set of tools for (i) counting and (ii) collating holes. Examples below…

- Euler Characteristic
- Betti Numbers

What are they? lets see below.

# Euler Characteristic (Algebraic Topology)

In algebraic topology and polyhedral combinatorics, the **Euler characteristic** (or **Euler** number, or **Euler**–Poincaré **characteristic**) is a topological invariant, a number that describes a topological space’s shape or structure regardless of the way it is bent.

# Betti Numbers (Algebraic Topology)

**Betti numbers** are **topological** objects which were proved to be invariants by Poincaré, and used by him to extend the polyhedral formula to higher dimensional spaces. Informally, the **Betti number** is the maximum **number** of cuts that can be made without dividing a surface into two separate pieces (Gardner 1984, pp. 9–10).

The first betti number ‘b0’ counts the number of 1 dimensional holes.

The second betti number ‘b1’ counts the number of 2 dimensional holes.

The third betti number ‘b2’ counts the number of 3 dimensional holes.

…

# 3 Properties of Topological Analysis

- Coordinate invariance
- Deformation invariance
- Compressed Representations — of shapes

# Enter Scikit-tda…

We assume you have python/conda installed.

You can now install scikit-tda by typing

pip install scikit-tdaconda install jupyter notebook

if you get an error and ripser fails to ‘build’ and install… don’t sweat… try doing this… (convoluted but works)

conda update -q condaconda install libpython m2w64-toolchain cython numpyecho [build] > C:\Anaconda3\Lib\distutils\distutils.cfgecho compiler = mingw32 >> C:\Anaconda3\Lib\distutils\distutils.cfgpip install ripserpip install scikit-tdaconda install jupyter notebook

# Hello World

import numpy as np

from ripser import ripser

from persim import plot_diagramsdata = np.random.random((100,2))

diagrams = ripser(data)['dgms']

plot_diagrams(diagrams, show=True)

# Now lets do two things…

- compute persistence homology of data sets,
- visualize persistence diagrams

[1]:%load_extautoreload

%autoreload2[2]:fromripserimportripserfrompersimimportplot_diagramsimportmatplotlib.pyplotaspltimportnumpyasnpfromsklearnimportdatasets[3]:data = datasets.make_circles(n_samples=100)[0] + 5 * datasets.make_circles(n_samples=100)[0]

# Default args make it very easy.

Generate diagrams for H0 and H1

Plot both diagrams

[4]:dgms = ripser(data)['dgms']

plot_diagrams(dgms, show=True)

[5]:# Plot each diagram by itself

plot_diagrams(dgms, plot_only=[0], ax=plt.subplot(121))

plot_diagrams(dgms, plot_only=[1], ax=plt.subplot(122))

# Homology over any prime basis

We can compute homology over any p≥2 by supplying the argument `coeff=p`

.

[6]:# Homology over Z/3Z

dgms = ripser(data, coeff=3)['dgms']

plot_diagrams(dgms, plot_only=[1], title="Homology of Z/3Z", show=True)

[7]:# Homology over Z/7Z

dgms = ripser(data, coeff=3)['dgms']

plot_diagrams(dgms, plot_only=[1], title="Homology of Z/7Z", show=True)# Only plot H_1

# Specify which homology classes to compute

We can compute any order of homology, H0,H1,H2,… By default, we only compute H0 and H1. You can specify a larger order by supplying the argument `maxdim=p`

. It practice, anything above H1 is very slow.

[8]:dgms = ripser(data, maxdim=2)['dgms']

plot_diagrams(dgms, show=True)

# Specify maximum radius for Rips filtration

We can restrict the maximum radius of the VR complex by supplying the argument `thresh=r`

. Certain classes will not be born if their birth time is under the threshold, and other classes will have infinite death times if their ordinary death time is above the threshold

[9]:dgms = ripser(data, thresh=0.2)['dgms']

plot_diagrams(dgms, show=True)

[10]:dgms = ripser(data, thresh=1)['dgms']

plot_diagrams(dgms, show=True)

[11]:dgms = ripser(data, thresh=2)['dgms']

plot_diagrams(dgms, show=True)

[12]:dgms = ripser(data, thresh=999)['dgms']

plot_diagrams(dgms, show=True)

# Plotting Options

[13]:# This looks weird, but it's possible

plot_diagrams(dgms, xy_range=[-2,10,-1,20])

# Colormaps

See all available color maps with

importmatplotlibasmpl

print(mpl.styles.available)[14]:plot_diagrams(dgms, colormap='seaborn')

# Plot lifetime of generators

[15]:plot_diagrams(dgms, lifetime=True)

# Source Code

Download ALL the jupyter notebooks in one .zip file from here … http://bit.ly/36JJ20h

The examples above are in the notebook called ‘Basic Usage.ipynb’

# Continue Learning…

# About us

This is our website http://automatski.com