Quantum Topological Data Analysis — The most powerful quantum machine learning algorithm | Part 1

Alexander Del Toro Barba (PhD)
21 min read2 days ago

--

Quantum TDA is a very promising quantum algorithm that resists dequantization. In part 1 we learn how to identify topological invariants in data with the methods from persistent homology.

This article is part of my quantum machine learning series. I‘ll distill research details to accelerate your learning journey. I will update this article regularly with new findings and advancements. Last updated: August 9, 2024.

Quantum Topological Data Analysis (QTDA) represents a unique intersection of quantum computing and topological data analysis (TDA), a field focused on understanding the shape (topology) of data. TDA is particularly useful for extracting robust features from complex data sets. The approach can reveal insights that are not easily obtainable through traditional, geometry-based statistical methods.

TDA relies on concepts like persistent homology to study the properties of data that are invariant under continuous transformations. This means that the core topological characteristics of data, as revealed by TDA, remain consistent even when the data is subjected to smooth, gradual changes. It allows TDA to extract meaningful information about data that is robust to noise and minor variations.

Quantum Topological Data Analysis, or QTDA, leverages quantum computing to enhance or accelerate topological data analysis tasks. This approach can potentially offer significant speedups for certain computations involved in TDA, such as calculating Betti Numbers, which count the number of connected components, holes, and voids in data, or more generally, in computing persistent homology, a technique that analyzes how topological features evolve across different scales.

What is Topology?

Topology is a branch of mathematics that studies the fundamental properties of shapes and spaces that remain unchanged under continuous transformations.

Transformation from cup to donut. Topologically they are the same object (Source)

Imagine you are stretching, bending, or twisting an object like a rubber sheet — topology focuses on the aspects that don’t break or tear during these deformations. This includes properties like connectedness, the number of holes, and how different parts of the space are related to each other. It is called “rubber-sheet geometry” because it is concerned with the essence of shapes rather than their rigid measurements.

Möbius strips, which have only one surface and one edge, are a kind of object studied in topology.

Applications of Topology

For what do we need topology? There are numerous promising use cases, ranging from brain analysis, rare disease, network optimization, high energy physics, and financial markets. Let me explain the potential applicability of topology on an example in financial asset management.

An important task in financial asset management is to predict financial price dynamics (volatility) and phase transitions in the stock markets. A topological approach to data analysis gained interest during the 2010s for predicting fundamental market shifts with mixed results. The promise is that topological methods are less susceptible to slight perturbations in data, and can predict fundamental changes in the structure of financial data more reliably.

Volatility index between 1990 and 2022. Spikes visible during the financial crisi 2008/09 and Covid in 2020

Topological data analysis has been suggested as an alternative to geometric approaches to detect financial market phase shifts in early warning systems. Building up on this work, EPFL scientists, together with local startup L2F, have developed a robust model that can predict when a systemic shift is about to occur, based on methods from a branch of mathematics called topological data analysis. And other research looked at how A persistent-homology-based turbulence index & some applications of TDA on financial markets.

https://arxiv.org/pdf/1703.04385

There are many frameworks that can be used to do TDA in classical data, such as Ripser, Gudhi, or Perseus. I intensively worked with Giotto-tda in the past. It offers a guide on how to run TDA on time series. After applying Taken’s embedding of the time series into a point cloud, they compute persistence diagrams to detect topological invariants that describe the time series and can help to detect critical phase transitions in financial markets.

Delay embedding (Taken’s embedding) of a time series into a pint cloud representation
!pip install --quiet giotto-tda
import numpy as np
import plotly.graph_objects as go
x_nonperiodic = np.linspace(0, 50, 1000)
y_nonperiodic = np.cos(x_nonperiodic) + np.cos(np.pi * x_nonperiodic)

fig = go.Figure(data=go.Scatter(x=x_nonperiodic, y=y_nonperiodic))
fig.update_layout(xaxis_title="Timestamp", yaxis_title="Amplitude")
fig.show()

# Create time delay embedding for this signal and visualise the resulting point cloud
from gtda.time_series import SingleTakensEmbedding
from gtda.plotting import plot_point_cloud
embedding_dimension_nonperiodic = 3
embedding_time_delay_nonperiodic = 16
stride = 3
embedder_nonperiodic = SingleTakensEmbedding(
parameters_type="fixed",
n_jobs=2,
time_delay=embedding_time_delay_nonperiodic,
dimension=embedding_dimension_nonperiodic,
stride=stride,
)
y_nonperiodic_embedded = embedder_nonperiodic.fit_transform(y_nonperiodic)
plot_point_cloud(y_nonperiodic_embedded)
# From time delay embeddings to persistence diagrams
y_periodic_embedded = y_periodic_embedded[None, :, :]
y_nonperiodic_embedded = y_nonperiodic_embedded[None, :, :]
from gtda.homology import VietorisRipsPersistence
# 0 - connected components, 1 - loops, 2 - voids
homology_dimensions = [0, 1, 2]
periodic_persistence = VietorisRipsPersistence(
homology_dimensions=homology_dimensions, n_jobs=6
)
print("Persistence diagram for periodic signal")
periodic_persistence.fit_transform_plot(y_periodic_embedded)
nonperiodic_persistence = VietorisRipsPersistence(
homology_dimensions=homology_dimensions, n_jobs=6
)
print("Persistence diagram for nonperiodic signal")
nonperiodic_persistence.fit_transform_plot(y_nonperiodic_embedded);
Persistent diagram and embedded pointcloud of the time series signal

Topological Data Analysis can be used in conjunction with methods from machine learning in the following way. It is also possible to add methods from tropical geometry.

From Homology to Topological Data Analysis

Invariants are the main object of homology and topological data analysis. Invariants are properties of mathematical objects that remain unchanged under transformations, concretely homeomorphisms, continuous transformations of the space that can be undone, like bending, stretching, or twisting the space, as long as you don’t tear or glue anything.

Objects in Homology (Source)

Homology is a method to determine and count invariants, like number of holes, voids, or connected components in data (=topological spaces). Homology achieves this by using homology groups from abstract algebra. For example, the first homology group of a space describes 1-dimensional holes (loops that cannot be shrunk to a point), the second homology group describes its 2-dimensional holes (voids enclosed by surfaces), etc.

The topological features or invariants of a space, like the number of connected components, holes, and voids in data are captured with algebraic structures called Homology groups — specifically, abelian groups. Each dimension (0th for connected components, 1st for holes, 2nd for voids, etc.) has its own homology group.

The homology of a topological space X is a set of topological invariants of X represented by its homology groups.

Homology is divided into simplicial and persistent homology. Simplicial homology analyzes the topological features (holes, voids) of a single fixed shape represented as a simplicial complex. It provides a snapshot of the shape’s structure at a particular scale. Persistent homology tracks how these topological features evolve across multiple scales or levels of detail. It analyzes a filtration of simplicial complexes (a sequence of nested complexes) to identify features that persist over a range of scales, indicating their significance.

Topological data analysis (TDA) uses Persistent Homology to study the topological features of a shape or data at different spatial resolutions. It captures multi-scale topological information, such as the number of connected components, loops, and voids.

The tools to analyze homology in a general algebraic setting come from Homological algebra. A central concept is that of chain complexes, which can be studied through their homology and cohomology. We will see further below why these are important.

A diagram used in the snake lemma, a basic result in homological algebra (Source)

Elements of Persistent Homology

Now we introduce the elements of homology: simplices, simplicial complexes, boundaries, boundary operators, cylces and chain complexes.

The building block of shapes (points, lines, triangles) is a Simplex. They can be 0-dimensional (points), 1-dimensional (lines, edges), 2-dimensional (triangles), a 3-dimensional simplex is a tetrahedron, and a 4-dimensional simplex is a 5-cell. Data is represented as a collection of simplices.

Low dimensional simplices (Source)

Computing homology requires the formation of a topological space characterized as Simplicial Complexes. This is built up from simpler pieces, the simplices. Abstract simplicial complexes are the combinatorial description of the geometric notion of a simplicial complex used in TDA. Often just called ‘complex’, they describe a family of sets that is closed under taking subsets, that is, every subset of a set in the family is also in the family. For example, in a 2-dimensional simplicial complex (2-simplex), the sets in the family are triangles (sets of size 3), their edges (sets of size 2), and their vertices (sets of size 1).

Given a metric space d : M × M and a distance threshold ϵ, there are different abstract complexes that can be constructed by forming a simplex (a generalization of a triangle to higher dimensions) for every finite set of points whose pairwise distances are all less than or equal to the threshold.

The most often used complexes are the Degree-Rips bifiltration, Vietoris–Rips complex, and Čech complex. At a given distance threshold (or scale parameter), the abstract simplicial complex includes simplices (points, edges, triangles, etc.) formed by points within that distance of each other. As we increase a threshold parameter, we create a filtration. The simplices are organized in chain complexes, which track how they are connected.

Example of a simplicial 3-complex (Source)

Information about holes, loops, voids and other invariants within the chain complexes are revealed using Boundaries of simplices. A “hole” refers to a cycle (a closed loop) that is not the boundary of any filled-in region. This means that holes are cycles that are not boundaries themselves (non-boundary cycles). Like a hollow donut; the inner ring is a cycle that represents a hole because it doesn’t enclose any area within the donut.

Boundary Operators are the tool to calculate homology and work by associating each object with its boundary. Given a simplicial complex, the boundary operator helps compute the boundary of each simplex. Boundary operators are crucial for constructing Homology groups.

Boundary operator (Source)
Examples of (q, h)-boundary operators where we denote [v 0 , . . . , v p ] = v 0···p (Source). The boundary operators are labeled as ∂₃,₂​ and ∂₂,₂​. Upper: ∂₃,₂​ indicates the boundary operator acting from 3-dimensional simplices to 2-dimensional simplices. Lower: The 2-simplex (triangle) [v₀,v₁,v₂] is being mapped to its 0-dimensional vertices {v₀,v₁, v₀,v₂}. Therefore, the appropriate boundary operator should be labeled ∂₂,₀​, to indicate the mapping from 2-simplices (triangles) to 0-simplices (vertices).

Example: The upper picture, labeled with ∂₃,₂​, shows the boundary operation taking a 3-simplex (a tetrahedron) and mapping it to its four 2-dimensional boundary components, which are the triangular faces (triangles). To continue the chain complex using boundary operators, you can then map the 2-simplex [v₀,v₁,v₂] (triangles) to its 1-dimensional faces [v₀,v₁], [v₁,v₂], and [v₂,v₀] (edges) based on the definition of the boundary operator in simplicial homology, which maps an n-simplex to its (n−1)-dimensional boundary simplices.

The final stage are Chain complexes. These are a sequence of abelian groups or modules, A₀, A₁, A₂, A₃, A₄, … connected by homomorphisms — which are the boundary operators (or differentials) dₙ : Aₙ → Aₙ-₁, such that the composition of any two consecutive maps is the zero map.

Illustration of boundary operators ∂, chain, cycle, and boundary groups along with their images under the boundary operators. The black dot is the empty set ∅

How can we detect holes or voids?

In homology theory we have three categories help identify and classify topological features within a space (chains, cycles and boundaries):

  • Chains that are not cycles indicate open or incomplete structures. A chain that has a non-zero boundary. This means it does not form a closed structure. For example, a sequence of edges that forms an open path or a collection of triangles that does not enclose a volume.
  • Cycles that are boundaries indicate trivial features that do not represent holes. In graph theory, a cycle is a path where the initial and final vertices are the same, and no other vertex is repeated. In topology, a cycle is a a simplicial chain with 0 boundary: A chain is called a cycle when its boundary is zero. A cycle is a chain whose boundary is zero, it forms a closed structure. However, if this cycle is also a boundary of a higher-dimensional chain, it represents a trivial feature in the space. For example, the boundary of a 2-simplex (triangle) is a 1-cycle consisting of its three edges. Since this 1-cycle is the boundary of the 2-simplex, it does not represent a significant hole. Such cycles that are boundaries do not correspond to “holes” but rather to filled-in regions.
  • Cycles that are not boundaries indicate significant topological features such as holes and voids, which are the primary interest in homology theory. These are captured by the homology groups, and the rank of these groups (Betti numbers) quantifies the number of such features. These are cycles that form closed structures with no boundary but are not themselves the boundary of any higher-dimensional chain. These represent significant topological features such as holes or voids. For example, a 1-cycle that forms a loop in a space where there is no 2-simplex filling it in represents a 1-dimensional hole (like the hole in a doughnut). Similarly, a 2-cycle forming a closed surface that is not the boundary of a 3-simplex represents a 2-dimensional void.

There are different types of cycles: A 1-cycle in a simplicial complex is a closed loop, that is a sequence of edges that form a loop with no beginning or end. A 2-cycle is a closed surface, e.g. a collection of triangles forming a surface without boundary.

Left: The boundary of a polygonal curve is a linear combination of its nodes; in this case, some linear combination of A₁ through A₆. Assuming the segments all are oriented left-to-right (in increasing order from Aₖ to Aₖ₊₁), the boundary is A₆ − A₁. Right: A closed polygonal curve, assuming consistent orientation, has null boundary. This chain is a cycle because it has a closed structure. (Source)

The existence of cycles can indicate the presence of holes or voids in the data — but only if it’s not also a boundary. For example, a 1-cycle (a closed loop of edges) suggests the presence of a 1-dimensional hole, such as the hole in the center of a loop or a ring. A 2-cycle (a closed surface of triangles) suggests the presence of a 2-dimensional void, such as the inside of a hollow sphere.

And now the connection to the boundary operator that results in zero: For a chain that is a cycle but not a boundary, the boundary operator does result in zero. Cycle: A chain c is a cycle if the boundary operator applied to it results in zero, i.e., ∂c=0. This means it forms a closed structure with no boundary. Boundary: A cycle c is a boundary if there exists a higher-dimensional chain b such that c=∂b. This means c is the boundary of some higher-dimensional chain.

  • Chain that is a cycle: By definition, for any cycle c, ∂c=0. This means it is a closed structure.
  • Chain that is a cycle and also a boundary: There exists a higher-dimensional chain b such that c=∂b. In this case, c is both a cycle and a boundary, meaning it does not represent a topological feature.
  • Chain that is a cycle but not a boundary: For such a cycle c, ∂c=0, but there does not exist any higher-dimensional chain b such that c=∂b. This means c is a significant topological feature, like a hole or void.

If we want to known if a chain is also a cycle, we take the boundary operator ∂ₚ​ that maps a p-simplex to its (p−1)-dimensional faces. A cycle is a chain c where applying the boundary operator results in zero. For a p-dimensional chain c, if ∂ₚ​( c ) = 0, then c is a cycle. This means that the simplices in the chain fit together in a way that their boundaries cancel out. Now you know why a chain is called a cycle when its boundary is zero.

Manual calculation of cycles that are a chain, such that applying the boundary operator results in zero. This is because the simplices in the chain fit together in a way that their boundaries cancel out.

For example in the image above, if I have a 2-simplex (triangle) [v₀,v₁,v₂] and take the boundary which is the sum of its three 1-dimensional edges [v₀,v₁], [v₁,v₂], and [v₂,v₀], I get the result zero: ∂([v₀,v₁,v₂]) = ​[v₁,v₂] — [v₀,v₂] + [v₀,v₁] = 0. This resulting sum of edges forms a closed loop, which means it is a 1-cycle. But: Whether this 1-cycle is a boundary of a higher-dimensional simplex or not is the key question to determine holes or voids.

  • The boundary of a 2-simplex (triangle) naturally forms a closed loop.
  • However, this 1-cycle is not necessarily indicating a hole by itself because it is the boundary of the 2-simplex we started with. It is a cycle, but it is also a boundary of a higher-dimensional simplex.
  • For a 1-cycle to indicate a hole, it must NOT be the boundary of any 2-simplex within the simplicial complex. If we have a 1-cycle in a simplicial complex and there is no 2-simplex whose boundary is this 1-cycle, then this 1-cycle represents a 1-dimensional hole. An example is the hole in a doughnut.
In step 1, you detect cycles from chains (using boundary operator). In step 2, you compute quotient groups H₀ and H₁ (Source: Course on Algebraic Topology and my own notes)

A simplicial chain is a formal sum of simplices (such as points, edges, triangles, etc.) in a simplicial complex, typically with coefficients in a given field (often ℤ₂​, ℤ, or ℝ). For example, a 1-dimensional chain might be a sum of edges, and a 2-dimensional chain might be a sum of triangles.

So, a cycle that is a simplicial chain with a zero boundary means it is a collection of simplices that form a closed structure, without any boundaries left over. This is important in homology theory, where cycles are used to identify topological features of spaces, such as holes or voids, because a hole is a region that is not filled in. In graphs, a hole can be thought of as a cycle that cannot be continuously shrunk down to a single point without breaking the graph. But let us go into detail of this process.

Homology Groups to determine holes and voids

To mathematically determine if a cycle represents a hole, we need to work within the framework of homology groups.

  1. Identify Cycles: Start by identifying all pp-cycles in your simplicial complex. These are chains cc such that ∂ₚ ​(c) = 0. This set of p-cycles forms a group called the cycle group, denoted ℤₚ​.
  2. Identify Boundaries: Next, identify all pp-boundaries. These are p-chains that are the boundary of (p+1)-chains, i.e., chains b such that b=∂ₚ​₊₁(d) for some (p+1)-chain d. This set of p-boundaries forms a group called the boundary group, denoted Bₚ​​.
  3. Compute the Homology Group: The p-th homology group Hₚ​ is defined as the quotient group Zₚ/Bₚ. This group captures the p-cycles that are not boundaries. In other words, Hₚ consists of equivalence classes of p-cycles where two cycles are considered equivalent if they differ by a boundary.
  4. Interpret the Homology Group: The elements of the homology group Hₚ​ represent the p-dimensional holes in the simplicial complex. The rank of the homology group, called the p-th Betti number βₚ​, indicates the number of independent p-dimensional holes.

Let us take the example from above, expand it and look for a 1-dimensional hole:

  • Identify 1-Cycles: Suppose you have a simplicial complex that includes a loop formed by edges [v₀,v₁​], [v₁​,v₂​], [v₂​,v₃​], and [v₃​,v₀​]
  • Form the Cycle Group Z₁: The 1-cycle c=[v₀,v₁​]+[v₁​,v₂​]+[v₂​,v₃​]+[v₃​,v₀​] satisfies ∂₁(c)=0
  • Identify Boundaries: If there are no 2-simplices (triangles) in the simplicial complex that have this 1-cycle as their boundary, then this cycle is not a boundary.
  • Compute the Homology Group H₁: In this case, H₁​ will include the equivalence class of the 1-cycle c. Since c is not a boundary, it represents a 1-dimensional hole.
  • Interpret the Result: The rank of H₁​ (the Betti number β₁​) tells you the number of independent 1-dimensional holes. If H₁ has rank 1, then there is one 1-dimensional hole.

So, in order to determine if a cycle represents a hole we have

  1. Calculate the cycle group Zₚ​ (all p-cycles).
  2. Calculate the boundary group Bₚ​ (all p-boundaries).
  3. Compute the homology group Hₚ=Zₚ/Bₚ
  4. If a cycle is in Zₚ​ but not in Bₚ​, it represents a hole.
Sequence: the image of one morphism equals the kernel of the next im(𝑓ₖ) = ker(𝑓ₖ₊₁)

Connection between Linear Algebra and Topology

This image above illustrates the concepts of the kernel and image of a linear map L:V→W in linear algebra. These concepts are related to the framework of homology groups in algebraic topology.

Linear Algebra Context:

  • Kernel: The kernel (or null space) of a linear map L:V→W is the set of all vectors in V that map to the zero vector in W. Mathematically, Ker(L)={v∈V | L(v)=0}. The kernel measures the elements in the domain that get “collapsed” to zero in the codomain.
  • Image: The image (or range) of a linear map L is the set of all vectors in W that are the output of L for some input vector in V. Mathematically, Im(L)={L(v) | v∈V}. The image measures how much of the codomain is “covered” by the linear transformation.

Homology Groups Context: In the context of homology groups, we deal with chains, cycles, and boundaries within simplicial complexes:

  • Chain Complex: A sequence of abelian groups (or modules) .. → Cₙ₊₁ → Cₙ → Cₙ₋₁ → .. connected by boundary operators ∂ₙ₊₁ and ∂ₙ etc.
  • Cycles and Boundaries: A p-chain is a cycle if it is in the kernel of the boundary operator: Zₚ=Ker(∂ₚ). A p-chain is a boundary if it is in the image of the boundary operator from the next dimension: Bₚ=Im(∂ₚ₊₁)
  • Homology Groups: The pp-th homology group is defined as the quotient group Hₚ=Zₚ/Bₚ. This means Hₚ​ consists of p-cycles modulo p-boundaries. In other words, it measures the cycles that are not boundaries.

Connection:

  • The kernel in linear algebra is analogous to the cycle group in homology theory, representing elements that map to zero (i.e., closed structures). The kernel of the boundary operator ∂ₚ​ (i.e., Ker(∂ₚ)) corresponds to the p-cycles Zₚ​. The kernel of the boundary operator ∂ₚ​, Ker(∂ₚ), consists of all p-chains that are mapped to zero by ∂ₚ. These p-chains are called p-cycles because they form closed structures without any boundary. Therefore, the kernel helps identify whether a chain is a cycle.
  • The image in linear algebra is analogous to the boundary group in homology theory, representing elements that are the result of mapping higher-dimensional elements (i.e., trivial features). The image of the boundary operator ∂ₚ₊₁ (i.e., Im(∂ₚ₊₁​)) corresponds to the p-boundaries Bₚ. The image of the boundary operator ∂ₚ₊₁​, Im(∂ₚ₊₁), consists of all p-chains that are the boundary of (p+1)-chains. These p-chains are called p-boundaries because they can be “filled in” by higher-dimensional simplices. Therefore, the image helps identify whether a cycle is a boundary of a higher-dimensional simplex.

The kernel helps to identify whether a chain is a cycle, and the image helps me to identify whether a cycle is a boundary of a higher-dimensional simplex.

  • To identify if a chain is a cycle: Check if it is in the kernel of the boundary operator. If ∂ₚ( c ) =0, then c is a cycle.
  • To identify if a cycle is a boundary: Check if it is in the image of the boundary operator from the next dimension. If c = ∂ₚ₊₁(b) for some (p+1)-chain b, then c is a boundary.
  • To identify significant topological features (holes or voids) in data: The homology group Hₚ=Ker(∂ₚ)/Im(∂ₚ₊₁) measures the difference between these cycles and boundaries, identifying significant topological features (holes or voids) in the data. The p-th homology group Hₚ​ is the quotient of the cycle group by the boundary group:

This measures the cycles that are not boundaries, representing the true topological features (holes or voids) in the space. This framework allows us to systematically study and quantify the topological features of a space, such as the number of connected components, holes, and higher-dimensional voids.

Why Abelian Group?

The Chain Complex is defined as a sequence of abelian groups .. → Cₙ₊₁ → Cₙ → Cₙ₋₁ → .. connected by boundary operators ∂ₙ₊₁ and ∂ₙ etc. But why Abelian? An Abelian group is a group with operation that is commutative.

  • Commutativity ensures that addition of chains is order-independent, which is important for defining cycles and boundaries consistently.
  • Associativity guarantees that the way chains are grouped and combined does not affect the outcome, simplifying complex calculations.
  • Identity and Inverses allow for the definition of null chains and the ability to “cancel out” chains, which is important for defining homology.

The requirement for chain complexes to consist of abelian groups ensures that the addition of chains is well-defined, the boundary operators act linearly, and the foundational theorems of homology theory apply. This structure simplifies algebraic manipulation of chains, cycles, boundaries, which makes the study of topological spaces more tractable and robust.

Consider a simplicial complex where chains are formal sums of simplices. If c₁ = [v₀,v₁​]+[v₁​,v₂​] and c₂ = [v₂​,v₃​]+[v₃​,v₀​], we can add these chains to get c₁ + c₂ = [v₀,v₁​]+[v₁​,v₂​]+[v₂​,v₃​]+[v₃​,v₀​]. The boundary operator ∂ can then be applied to this sum, and due to the linearity guaranteed by the abelian structure, ∂(c₁+c₂)=∂(c₁)+∂(c₂).

Filtration and Inclusion Maps for TDA

An Exact Sequence is a sequence of morphisms between objects (groups, rings, modules, and, more generally, objects of an abelian category) such that the image of one morphism equals the kernel of the next.

A a sequence in the context of group theory

A sequence of groups and group homomorphisms (illustration above) is called exact if the image of each homomorphism is equal to the kernel of the next: im(𝑓ₖ)=ker(𝑓ₖ₊₁). The sequence of groups and homomorphisms may be either finite or infinite. Every exact sequence is a chain complex.

In order to keep track of the invariants, we record “birth” and “death” of features (connected components, loops, voids, etc.) during across the filtration with so-called topological barcodes and persistence diagrams.

A filtration on a data set (Source)

Filtrations are fundamental in persistent homology used to capture the topological features of a space at different scales. A filtration is a nested sequence of simplicial complexes: By gradually increasing the distance threshold ϵ, we create a sequence of simplicial (or nested) complexes, where each complex is included in the next, and whereyou connect points that are within ϵ distance of each other.

This sequence, or filtration, captures topology at various scales. It tracks how topological features evolve. As the distance threshold increases, more simplices are added, capturing the changing topology of the data at different scales. And for each simplicial complex in the filtration, you compute its homology groups (which are important to derive Betti numbers). It will tell you the topological features (connected components, loops, etc.) at that stage of the filtration, as they evolve over the filtration.

Illustration of data from a point cloud data and the corresponding barcode plots of persistent features
  • 0-Simplices: Every point in X is always a 0-simplex. For ϵ=0, each data point in X is treated as a 0-simplex. There are no edges or higher-dimensional simplices since the distance between distinct points is greater than 0.
  • 1-Simplices (Edges): For every pair of points p and q in X, if the distance between p and q is less than or equal to ϵ, you add an edge between p and q, making it a 1-simplex.
  • Higher-Dimensional Simplices: A set of k+1 points forms a k-simplex if and only if every pair of points within the set is within distance ϵ of each other. For instance, three points form a triangle (2-simplex) if every pair among the three points is connected by an edge of length ≤ ϵ. Include a k-simplex if and only if all its faces are already in the complex. In the case of Vietoris-Rips, this means including a k-simplex if all the pairwise distances between its vertices are within ϵ.
A filtration based on the Vietoris-Rips complex (Source)

Inclusion maps lead from each simplicial complex to the next. This sequence of simplicial complexes, with inclusion maps, is called a filtration. With increasing size of distance d, we are dealing with a sequence of simplicial complexes, each a sub-complex of the next. That is, a simplicial complex constructed from data for some small distance is a subset of the simplicial complex constructed for a larger distance.

Barcodes and Persistence Diagrams represent the information after computing homology groups at each stage of the filtration. It gives a visual representation of how long each topological feature (loop or void) persists as the scale changes. The longer it persists, more likely it is an invariant. Each topological feature appears and disappears as you progress through the filtration. This can be represented as a barcode or persistence diagram.

Barcode and Persistence Diagram (Source)

Matrix Reduction Techniques can be applied to reduce matrices to a simpler form (Smith normal form) and extract homological information.

In summary, TDA constructs a filtration of simplicial complexes, where the simplicial complex evolves as a parameter (often a distance or density threshold) changes. The output of persistent homology is a barcode or persistence diagram, which provides multi-scale summary of the homological features. It essentially captures when each topological feature appears and disappears as you vary the resolution or threshold.

Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect those of my employer.

👉 Follow me: LinkedIn, Google Scholar and www.deltorobarba.com

--

--