What is Dequantization in Quantum Machine Learning?

34 min readJun 26, 2024

AI and Quantum Computing are among the hottest topics in tech. However, some popular quantum machine learning algorithms have proven to be dequantizable, blurring the line between quantum and classical computing. In this article I will explain in detail how a quantum recommender engine was developed, and how it was later dequantized.

This article is part of my quantum machine learning series. I‘ll distill research details to accelerate your learning journey. I will update this article regularly with new findings and advancements. Last updated: July 12, 2024.

“AI is eating the world” has emerged as new mantra inspired by Marc Andreessen’s famous essay “Why software is eating the world” in WSJ from 2011. Machine learning has proven to be a transformative technology across a wide range of applications, from healthcare and finance to retail and manufacturing. AI has reshaped entire industries and our daily lives.

Now, Quantum Computing is becoming a new, significant technology on the horizon. One of its promising applications is the ability to enhance classical machine learning. These include Quantum Annealers and Quantum Approximate Optimization Algorithm designed for optimization problems, Variational Quantum Eigensolvers, used in chemistry and ML, Quantum Support Vector Machines, quantum version of kernel-methods, and Quantum Topological Data Analysis for data with topological features.

Quantum Neural Network architecture (Source)

Both AI and quantum computing have the potential to bring about substantial changes individually. But what possibilities might arise from combining these two technologies?

As it turns out, combining machine learning with quantum computing is proving to be more complex and challenging than initially anticipated. Quantum machine learning is suffering from Barren plateaus in the loss landscape, insufficient backpropagation scaling, hardness to achieve exponential speedups, difficulties to benchmark, and many even remain untrainable. Additionally, methods like classical shadows enable learning of essential properties of quantum states using classical machine learning techniques. On the other hand, quantum machine learning can require exponentially fewer samples for certain learning tasks using quantum data, have the potential to reduce energy consumption, and, at least in principle, promises to accelerate large-scale classical machine learning.

While many potential use cases for quantum machine learning have been proposed, their practical realization hinges on the development of scalable, fault-tolerant quantum computers capable of executing algorithms reliably. Quantum error correction techniques are expected to play a crucial role in achieving this goal, but their scalability have yet be demonstrated.

Ongoing research is refining our understanding to identify where quantum machine learning can truly add value and where it may not be as effective. Dequantization is one such area of study.

What is Dequantization?

In quantum machine learning, dequantization is the process of translating quantum algorithms that are designed for learning on classical data into classical machine learning algorithms with a similar performance.

“Quantum computing applications that are realizable with zero qubits!” Scott Aaronson pitching Ewin Tang’s work

This is possible because researchers sometimes do not fully account for the computational costs of data preprocessing when they present claims about (exponential) speedups of quantum machine learning algorithms compared to the best known classical methods. Dequantization methods exploit special properties within the input data, and employ stochastic (or randomized) techniques in the algorithmic design.

Dequantization gained traction in 2018, when Ewin Tang investigated a quantum recommendation system algorithm developed by Kerenidis and Prakash for a Quantum Recommendation Systems. The researchers promised an exponential speedup over known classical algorithms, under certain assumptions. Ewin Tang was able to dequantize the quantum machine learning algorithm by applying a classical sampling technique which enabled a significant speedup over previous classical algorithms. This closed the gap to the best known quantum machine learning algorithm to a polynomial difference, which means that it was able to run efficiently on a classical device, with a small overhead.

Ewin Tang on quantum singular value transformation (QSVT)

Since then, work continued to dequantize other QML algorithms, such as principal component analysis and supervised clustering, low-rank stochastic regression and sublinear classical algorithms for solving low-rank linear systems. It demonstrated that classical approaches can often achieve similar results to quantum algorithms under certain conditions.

Classical Recommender Engine Algorithm

Let us first describe how classical recommendation engines work mathematically, then how the quantum recommender engine works and in the final step how the dequantized version works. Two important concepts in recommender engines are the low rank assumption and the relevance of matrix reconstruction to generate recommendations.

The low-rank assumption on which recommendation systems are built up, states that user preferences can be categorized into a limited number of types, allowing us to infer individual preferences based on the preferences of similar users. This assumption is grounded in the idea that most people’s preferences are not unique but rather fall into one of a few categories.

This means that a m × n preference matrix P with m users and n products can be approximated by a smaller, low rank matrix P* that groups the users by preferences. The rank k is small and can be thought of as a constant independent of the number of users m or the number of products n. This rank is asymptotically much smaller than the size of the matrix

The idea behind low rank approximation is that it uncovers some sort of archetypical user types to which each user belongs. In this sense, each individual user is some noisy version of the archetypical user it belongs to.

The low-rank assumption in recommendation systems implies that users can be categorized into a few types who share preferences, they agree with these types on the high-value elements. This is crucial for generating meaningful recommendations, because if users within a type disagreed on highly-rated items (and agree on many small value elements), the low-rank nature of the matrix wouldn’t guarantee the quality of recommendations, even though the matrix is low-rank.

Kerenidis and Prakash, who developed a quantum machine learning algorithm for a recommender engine, conceptually built on the following classical recommendation algorithm. You start with the original preference matrix A, which represents user preferences or interactions:

Illustration of a user-item preference matrix A

Step 1: Generate a subsample matrix

You generate a subsample matrix Ã from the original matrix A. The sampled matrix Ã is close to A on known samples (user preferences) and sets unknown entries outside Ω to zero for all i=1,…,m and j=1,…,n. If A is large and sparse, this step will help to reduce computational cost and to focus on the known interactions only. The sampling is assumed to be according to uniform distribution so that each entry has equal probability p to be sampled. You will receive a result normalized to Ã_ij = A_ij / p.

Step 2: Perform Singular Value Decomposition

In the next step the recommender engine performs Singular Value Decomposition (SVD) on the subsample matrix Ã. SVD decomposes the matrix Ã into its constituent parts Ã = UΣV^⊤, where m×m is an orthogonal matrix (left singular vectors), Σ is an m×n diagonal matrix of singular values, and V is an n×n orthogonal matrix (right singular vectors). This allows in the next step to extract the top-k singular vectors (columns of U and V) associated with the largest singular values from Ã. For a matrix 𝐴 ∈ ℝ^{m.n} the following expression is the Singular Value Decomposition of A:

Step 3: Obtain top-k singular values for Low-rank approximation

In the third step the top-k singular vectors and value obtained from Ã are used to construct a Low-Rank Approximation Â of the original preference matrix A, where U_k, Σ_k, and V_k consist of the first k columns of U, Σ, and V. This approximation Â minimizes a distance, such as the Frobenius norm ||A —Â||_F to ensure that it is mathematically close.

Â is a rank- 𝑘 matrix minimizing the Frobenius norm distance from 𝐴, meaning it is the rank-k approximation of A. The low rank approximation Â can be written in truncated SVD form UΣV or in summation form Σσuv⊤, where k ≤ min(m,n). This means that Â is constructed by summing only the first k singular values and their corresponding singular vectors:

U is the m×k matrix consisting of the first k columns of U.
V is the n×k matrix consisting of the first k columns of V.
Σ is k×k diagonal matrix with the first k singular values σ_1,σ_2, .. ,σ_k :

Truncated SVD forms capture rank-k approximations by considering only most significant k singular values and their corresponding singular vectors. In the summation form Â, each term σ u v⊤ represents a rank-1 matrix:

The truncated SVD form is in detail as follows, where all σ are the singular values of A, u are the columns of U, the left singular vectors, and v are the columns of V, the right singular vectors.

The Frobenius norm is a measure of the total size of the matrix, and using it in this context helps to minimize or compress the size. The Frobenius norm is defined as follows:

The Frobenius norm takes the absolute value of each element a_ij of the matrix, squares each value, sums them, and then takes square root over the total amount. This provides a measure of the magnitude of the matrix in a way that extends the concept of the Euclidean norm for vectors to matrices.

In the context of Singular Value Decomposition (SVD) it can be written as:

The matrix Â is the reconstructed or approximated matrix of A, and captures the essential structure of A using only its dominant modes of variation, as identified by the top-k singular vectors. The reconstruction algorithm computes the projection of A onto the space spanned by its top-k singular vectors. All known algorithms for matrix reconstruction require time polynomial in the matrix dimensions.

The classical recommendation is working in two steps: First, a low-rank approximation is computed. This takes 𝒪(poly(mn)) runtime complexity. Second, a recommendation is delivered online from an intermediate result every time a user interacts with a system. This takes runtime 𝒪(nk) in the classical case. The classical algorithms could output the entire user vector of the low-rank approximation, which takes 𝒪(n). But it is not really necessary to know the exact preference for every item. As mentioned before, it suffices to have a sample from items with high preference values.

Quantum Recommendation Algorithm

Modern recommender systems often use matrix completion techniques, which are effective but require linear time. The quantum recommender engine from Kerenidis and Prakash explores stricter input assumptions to achieve sublinear runtime in both number of users m and items n, which marked a significant speedup. Kerenidis and Prakash achieved an exponential speedup over the best known classical algorithms by addressing the concerns that Scott Aaronson described a year earlier about the potential limitations to develop practically relevant instances of a machine learning problem with superpolynomial runtime separation.

On a high level, the quantum algorithm takes samples from a low-rank approximation of an input matrix. A good recommendation for user i is calculated by sampling from the ith row of a low rank approximation of the subsampled data A, instead of low-rank completion. This leads to sublinear-time sampling for good recommendations.

The difference is that low-rank approximation deals with approximating an original matrix with a lower-rank matrix, whereas low-rank completion is about inferring the missing entries of a matrix, given that the completed matrix should have low rank. Low-rank completion is significantly more computationally intensive than low-rank approximation, especially for large-scale problems with many missing entries.

In principle, is it not really necessary to compute the entire low-rank matrix Â derived from the projection of the top-k row singular vectors. It is sufficient to be able to sample from this matrix, since it should be close enough to the original preference matrix A, if A had a good rank-k approximation and the distance ||A — Â||_F is minimized. If the sub-samples Ã are uniformly distributed then projecting onto the top k singular vectors of Ã is an approximate reconstruction for the preference matrix A.

Given matrix A is a 0–1 matrix, this means that sampling from Â is about finding with high probability a 1-element in matrix A. In this sense, the task of delivering good recommendations is mathematically reduced to being able to efficiently sample from the low-rank matrix Â.

The probability to recommend one item Â_ij to a specific user i is defined as: |Â_ij|² / || Â_i ||², where we took samples Â_i from the i-th row of the matrix Â. The row Â_i is the projection of the row i in the subsampled matrix Ã onto the top-k row singular vectors of Ã.

Here is where the quantum computer comes in: the quantum algorithm samples a high-value element from the row Â_ i in time polylog(mn). It does not output the entire row Â_ i, because this would take time linear in dimension n and is also not necessary.

By giving the system a vector, a matrix, and a threshold parameter, it generates a quantum state that corresponds to the projection of the vector onto the space spanned by the row singular vectors of the matrix whose corresponding singular value is greater than the threshold. An item is then sampled by measuring the quantum state in the computational basis.

Error Bounds on Recommendation Quality

Let us show that providing good recommendations reduces to being able to sample from a matrix Â which is a good approximation to the original preference matrix A in distance of the Frobenius norm.

If Â is an approximation of preference matrix A such that ||A — Â||_F ≤ ɛ ||A||_F, then the probability of receiving a bad recommendation from a sample according to Â is as follows, where ( ɛ / 1 - ɛ)² denotes the upper bound on the probability that a given recommendation is bad:

This shows that sampling from a matrix that is close to the matrix A under the Frobenius norm leads to good recommendations for most users with high probability. This bound can be proved by the triangle inequality. It guarantees, that the probability of a bad recommendation for an average user is small. And it sets an emphasis on the users that like many products over those that like almost nothing, which leads to a sampling with guaranteed good performance.

If we expand to the scenario, and we guarantee a good recommendation for most users and every user has the same importance, then the closeness guarantee in the Frobenius norm is not sufficient. This is because if most users don’t like any item and a few like all times, then the approximation matrix Â is close to the preference matrix A under the Frobenius norm. This leads to good recommendations for very few users and bad recommendations for everyone else that exhibits little interaction.

We need to expand our term to ensure qualitative recommendations for this scenario. If A is a matrix m×n, and S a subset of rows of size |S| ≥ (1−ζ)m, for ζ >0, such that for all i ∈ S, then for some for some γ > 0 :

If we now take our matrix Â, which is an approximation of preference matrix A such that ||A — Â||_F ≤ ɛ||A||_F, then there exists a subset S’ of S of size at least (1 - σ – ζ)m (for d > 0), such that on average over the users in S’ the probability that a sample from the row Â_i is a bad recommendation is:

You see how they added in the denominator the term (1/ √(1+γ) - ɛ/ √σ)² (1 — σ — ζ) and also expanded the bound in the numerator? This shows that we can keep the error under control. Allowing for reasonable variations in user preferences does not significantly increase the error in recommendation systems. Kerenidis and Prakash demonstrated that a 10% deviation from the average preference only results in a maximum error increase of 50%. Furthermore, recommending multiple products instead of just one can further improve the quality of recommendations, as long as at least one suggestion is accurate, a common practice in recommenders.

Quantum Recommendations Algorithm

The main quantum primitive required for the recommendation system is a quantum projection algorithm. It runs in time polylogarithmic in matrix dimensions. For a vector x ∈ ℝ^{n} and a matrix A ∈ ℝ^{m×n} the algorithm has quantum access to efficiently create a quantum state |x⟩ corresponding to vector x and quantum states |A_i⟩ corresponding to each row A_i of the matrix A, in time polylog(mn). The time to store a new entry is (i,j,A_ij) is polylog(mn) and the time to write down entries is logarithmic.

Kerenidis and Prakash’s quantum recommendation algorithm

The quantum recommender engine from Kerenidis and Prakash requires two parts to perform an efficient quantum projection algorithm:

Load data into qRAM and apply amplitude encoding to prepare quantum state ∣R⟩
Embed quantum state ∣R⟩ into a larger unitary matrix U using block encoding to ensure that matrix can be processed with quantum algorithms
Matrix exponentiation to construct the unitary operator
Quantum phase estimation for singular values and vectors
Remove low-quality recommendations with threshold
Construct vectors with projections on subspace spanning
Recommendations measurement using Swap test

During data loading, Kerenidis and Prakash state that the size of the data structure is 𝒪(w · log² 2(mn)), and the time to store a new entry (i,j,A_ij) is 𝒪(log² (mn)). The quantum algorithm has quantum access to the data and can perform the quantum mapping Ũ : |i⟩|0⟩ → |i⟩|A_i⟩ for i ∈ [m] for the matrix rows stored in memory and Ṽ : |0⟩|j⟩ → |Â⟩|j⟩ for j ∈ [m], where Â ∈ ℝ^{m} with entries Â_i = || A_i || in time polylog(mn).

The matrix reconstruction algorithm takes as input a subsample Ã of some matrix A. This can be done in different ways, such as sampling rows and/or columns of the matrix according to some distribution, or subsampling a matrix by sampling each element of the matrix with some probability, developed by Achlioptas and McSherry in 2001. This approach is what Kerenidis and Prakash used in their quantum algorithm.

In this method, each element in matrix A of size m × n is sampled with probability p and rescaled to construct the random matrix Ã. In this matrix each element is equal to Ã_ij / A_ij/p with probability p and 0 otherwise.

The reconstruction algorithm computes the projection of the input Ã onto its k-top singular vectors. The result of this projection is the approximated low-rank matrix Â. It can be shown that the approximation error ||A — Ã_k|| is close to ||A — A_k||. Projecting onto top k singular vectors of the sub-sampled matrix Ã thus suffices to reconstruct a matrix approximating A.

This means that task of providing good recommendations for a user i reduces to being able to sample from the i-th row of the matrix Ã_{σ, κ}. Sample from projections of the i-th row of Ã onto the space spanned by all row singular vectors with singular values higher than σ and possibly some more row singular vectors with singular values in the interval [(1-κ)σ, σ).

Step 1: Amplitude encoding and qRAM

QRAM is employed to load the user-item rating matrix into quantum states efficiently. This step prepares the quantum representation of the matrix, and it allows the quantum computer to access and manipulate large amounts of data in superposition. They encoded the classical data by utilizing qRAM to prepare the quantum states and implement unitary operations that implicitly contain the information about the matrix.

Concretely, Kerenidis and Prakash prepared a quantum state that encodes the user-item rating matrix R or user preference vector into the amplitudes of a quantum state to prepare the quantum state ∣R⟩ that represents the matrix. Amplitude encoding is a method to represent a vector or matrix within a quantum state. Each entry of the vector or matrix is encoded in the amplitude of a quantum state. Given a vector x⃗ with entries x_i, amplitude encoding represents this vector as a quantum state ∣ψ⟩ such that:

Once the initial state is prepared using amplitude encoding, QRAM is used to load the additional data associated with each index into the quantum state. QRAM allows the efficient access and loading of data into the quantum state without collapsing the superposition. It maps each index state ∣i⟩ to a state ∣i⟩∣Xi⟩, where ∣Xi⟩ represents the data stored at index i. qRAM locates one memory cell in log n time. This means it takes n(n log n) time to access n memory cells. Loading n data can be done in n(log n) time. The storage of data in qRAM can be summarized using this equation, where ∣Xi⟩, could represent the quantum state encoding the rating value Xi:

For each row of the matrix, that we view as a vector in ℝ^{n}, we store an array of 2n values as a full binary tree of n leaves. The leaves keep the individual amplitudes of the vector and each internal node keeps the sum of the squares of the amplitudes of the leaves rooted on this node, where each entry added to the tree requires an update log(n) nodes in the tree:

Vector state preparation illustrated for 4-dimensional state |φ⟩. For this vector φ, the root of this tree is 1.0 which is the 𝓁²-norm squared of the whole vector. The children of the root show the 𝓁²-norm squared of the part of the vector where the first qubit is 0 (=0.32) and part of the vector where the first qubit is 1 (=0.68).

Let us have a closer look at qRAM or Quantum Random Access Memory. Proposed in 2008, qRAM is a model used in quantum computing that promises to efficiently store and retrieve quantum information from multiple locations simultaneously, meanwhile in classical computing, accessing and manipulating large data sets can be a bottleneck. In the quantum approach, each entry of the user-item interaction matrix A is loaded into a quantum register, where data is encoded into a quantum state ∣A⟩ such that each entry A_ij can be queried in superposition.

One possible way to represent qRAM is by a bifurcated graph (binary tree) to hierarchically organize data. This tree structure allows efficient superposition access to multiple data points simultaneously. Each leaf node represent the fundamental data points, such as a specific feature of user preferences, like rating for a particular genre, actor, or director. The internal nodes combine preferences from multiple features, for example, an internal node could aggregate ratings across all action movies.

Schematics for a quantum random access memory in form of a bifurcation graph

qRAM plays a crucial role in the quantum algorithm, because it is not the singular value estimation or projections that are computationally so difficult, but rather the fact that these steps require reading the full input. Low-rank matrix approximation takes time complexity linear in input-sparsity a sublinear with some number of passes through the data, whereas the quantum algorithm achieves query complexity polylogarithmic in the input size using qRAM.

This means that a data structure to efficiently prepare quantum states is an essential prerequisite for the overall speedup that Kerenidis and Prakash promise with their approach. The query complexity is polylogarithmic in input size, where querying vectors in superposition and preparing states in arbitrary length-n input vectors takes Ω(√n) time. Without anticipating too much, but Ewin Tang noticed that that this data structure that is used to satisfy state preparation assumptions can also satisfy 𝓁²-norm sampling assumptions, and allows for an efficient dequantization.

Step 2: Block Encoding to embed quantum states

In the first step, Kerenidis and Prakash used block encoding, which allows an efficient representation of matrices within a larger unitary operator. Block encoding is a technique that embeds matrices, representing classical data or non-unitary operators, as sub-blocks within larger unitary matrices.

A matrix R is embedded as R/||R|| into the top-left block of a larger unitary matrix U, where ||R|| is the spectral norm or some appropriate normalization factor. This embedding allows the use of quantum algorithms for linear algebra operations that we need later, such as singular value decomposition (SVD), eigenvalue estimation, and matrix inversion. The * entries represent other parts of the unitary matrix that ensure its unitarity. The block encoding step often involves using additional ancilla qubits to ensure the overall matrix is unitary.

Block encoding is a crucial first step since quantum operations are represented by unitary matrices. These matrices satisfy U†U=I. Classical matrices are not necessarily unitary, so they cannot be directly used in quantum algorithms for unitary operations. Additionally, quantum states must be normalized, which means that the sum of the squares of their amplitudes must equal 1. The original matrix may not be normalized, which makes it unsuitable for direct representation as a quantum state.

Step 3: Matrix Exponentiation to construct unitary operators

Once the classical matrix R is block encoded in the unitary matrix U, we can exponentiate U to perform operations to construct the unitary operator. This is required for quantum phase estimation. This involves constructing a unitary operator whose eigenvalues are related to the singular values of the matrix. QPE is then applied to this unitary operator.

Evolution Operator for Matrix Exponentiation

Concretely, matrix exponentiation refers to using quantum circuits to exponentiate the unitary matrix U. The unitary matrix U is applied repeatedly to simulate the evolution of e^iUt for some time t. This step is essential for phase estimation and extracting eigenvalues / singular values.

Matrix exponentiation then utilizes this unitary matrix to perform necessary operations like quantum phase estimation. This sequence ensures that the original non-unitary matrix R can be effectively used within the constraints and capabilities of quantum computing.

The quantum states are prepared in a superposition, controlled-unitary operations are applied, and an inverse quantum Fourier transform (QFT) is performed to extract the phases, which correspond to the singular values.

Step 4: Singular Value Estimation for on Quantum Phase Estimation

In the next step they applied Quantum Phase Estimation to extract eigenvalues and eigenvectors of the matrix, as these are related to the singular values and singular vectors due to the properties of block encoding and QPE. The approximated singular values and corresponding singular vectors can be used to construct the lower-dimensional representation of the user-item matrix.

Singular value estimation is performed by taking matrix A such that vector states corresponding to its row vectors can be prepared efficiently. Given a state |x⟩ = ∑ α |v⟩ for every i of α and v in an arbitrary vector x ∈ ℝ^{n}, that task is to estimate the singular values corresponding to each singular vector in coherent superposition. Note that we take the basis {v_i) to span the entire space by including singular vectors with singular value 0.

Kerenidis and Prakash formulate a quantum singular value estimation algorithm in the same way as the quantum walk based algorithm by Childs for estimating eigenvalues of a matrix, and they show that given quantum access to the data structure, their algorithm runs in time 𝒪(polylog(mn)/ɛ).

Quantum Singular Value Estimation by Kerenidis and Prakash

They employ quantum phase estimation, a fundamental and popular algorithm in quantum computing, to find the most important singular values and directions in the input data and then simplify the data by focusing on those key elements. This is done by a method called projection, where singular values and singular vector are used to project a quantum state with a row of the input to state with the corresponding row of a low-rank approximation.

Circuit of quantum phase estimation, which estimates the phase of an eigenvector of a unitary matrix

# Quantum Phase Estimation with Google Cirq

try:
    import cirq
except ImportError:
    print("installing cirq...")
    !pip install cirq --quiet
    import cirq
    print("installed cirq.")

import random
import matplotlib.pyplot as plt
import numpy as np

"""Set up the unitary and number of bits to use in phase estimation."""
# Value of θ which appears in the definition of the unitary U above.
# Try different values.
theta = 0.234

# Define the unitary U.
U = cirq.Z ** (2 * theta)

# Accuracy of the estimate for theta. Try different values.
n_bits = 3


"""Build the first part of the circuit for phase estimation."""
# Get qubits for the phase estimation circuit.
qubits = cirq.LineQubit.range(n_bits)
u_bit = cirq.NamedQubit('u')

# Build the first part of the phase estimation circuit.
phase_estimator = cirq.Circuit(cirq.H.on_each(*qubits))

for i, bit in enumerate(qubits):
    phase_estimator.append(cirq.ControlledGate(U).on(bit, u_bit) ** (2 ** (n_bits - i - 1)))

print(phase_estimator)


"""Build the last part of the circuit (inverse QFT) for phase estimation."""
# Do the inverse QFT.
phase_estimator.append(make_qft_inverse(qubits[::-1]))

# Add measurements to the end of the circuit
phase_estimator.append(cirq.measure(*qubits, key='m'))
print(phase_estimator)


"""Set the input state of the eigenvalue register."""
# Add gate to change initial state to |1>.
phase_estimator.insert(0, cirq.X(u_bit))

print(phase_estimator)


"""Simulate the circuit and convert from measured bit values to estimated θ values."""
# Simulate the circuit.
sim = cirq.Simulator()
result = sim.run(phase_estimator, repetitions=10)

# Convert from output bitstrings to estimate θ values.
theta_estimates = np.sum(2 ** np.arange(n_bits) * result.measurements['m'], axis=1) / 2**n_bits
print(theta_estimates)

Step 5: Remove low-quality recommendations with threshold σ

Singular values below the threshold σ are filtered out, and only those greater than σ are retained. The threshold parameter σ is crucial for filtering out noise and focusing on high-quality and relevant recommendations from the user-item matrix.

Kerenidis and Prakash claimed that samples following these distributions require time linear in the input dimensions when classical methods are used. They applied the quantum algorithm to make fast recommendations, built on strong assumptions about input data and a dynamic data structure.

Step 6: Construct preference vector with projections

The threshold parameter σ is the minimum (singular) value used for the projection of a vector onto the space spanned by the row singular vectors of the matrix which corresponds to the quantum state that is generated by the quantum algorithm. This serves to deliver good recommendations for which the user has a preference higher than a threshold, or the hundred products with highest preference etc.

The user’s preference vector is projected onto the subspace spanned by the significant singular vectors (those corresponding to singular values above σ). The user preference vector, after being projected onto the singular vectors with significant singular values, is encoded as a quantum state. This state represents the relevant features of the user’s preferences. Similarly, item feature vectors are also encoded as quantum states. To generate recommendations, the quantum state representing the user’s projected preferences is compared with the quantum states representing various items in the next step.

The quantum recommender is given access to vector state x, a matrix A and parameters σ, κ, outputs the following state, which is a projection of x onto the subspace spanned by the union of the row singular vectors whose singular values are bigger than σ, and a subset of row singular vectors whose corresponding singular values are in interval [(1 — κ)σ, σ).

When you measure this state, you will receive samples from the rows with a probability proportional that is magnitude, that is, its relevance. Measuring this simplified data reveals the important parts with higher chances. This process is much faster than traditional methods.

Step 7: Swap Test to provide recommendations

After the singular values below the threshold σ are filtered out, and the user’s preference vector is projected onto the subspace spanned by the significant singular vectors, the quantum state that represents the relevant features of the user’s preferences is compared with the quantum states that represents different items in order to generate recommendations. This comparison or Similarity Measure is conducted with the Swap test.

Circuit implementing the swap test between two states | 𝜙 ⟩ and | 𝜓 ⟩ using two Hadamard gates, one controlled SWAP gate (Fredkin gate) and one measurement gate

The quantum algorithm can access the hierarchical structures of qRAM in superposition, quickly constructing states ∣ψ⟩ and ∣ϕ⟩. The swap test estimates the overlap between quantum states, in this case the user preference vector (projected onto the singular vector subspace) and the item vectors ∣⟨ψ∣ϕ⟩∣². A higher overlap indicates higher similarity, which leads to a recommendation.

Swap test to measure overlap between quantum states

Concretely, qRAM consist of an address registers, which is a quantum registers that hold the addresses of the data points in the bifurcated graph, and a data registers that hold the actual data values (user preferences or item features). The goal is to compare the similarity of a user with an item. The swap test calculates the overlap between two quantum states.

Other Quantum Measurement Methods

The quantum measurement can also be done using positive operator-valued measure (POVM). This is another method for quantum measurements, which offers a more general framework than projective measurements. This is a way to verify the presence of desired states, and to determine values of observables. The probability of receiving a certain measurement outcome α for a given quantum state |φ⟩ is:

The term |φ⟩ is the quantum state of the system that is being measured, and M_α is one of the operators that represents a specific measurement outcome (it is referred to as a positive semi-definite operators in the POVM set). The term ⟨φ|M_α φ⟩ is the expectation value of M_α in the state |φ⟩. It tells us the average value we would expect to get if we repeatedly measured M_α on systems prepared in the state |φ⟩. The expectation value is an extremely important concept in quantum computing. Tr indicates the trace operation, a standard term in linear algebra that sums the diagonal elements (the eigenvalues , counted with multiplicities) of the final matrix after the transformation. In total, Tr(⟨φ|M_α φ⟩) denotes the trace of this expectation value, which, as we said, is the probability of a measurement outcome M_α when the system is in the state |φ⟩. If |x⟩ is measured in the standard basis, then outcome i is observed with probability x²_i / ||x||².

While powerful, POVMs are more complex to implement and may not be as straightforward for calculating overlaps or similarities directly. The swap test, on the other hand, is well-suited for directly comparing quantum states and determining overlaps, making it more practical and efficient for the task of measuring similarities in a quantum recommender system.

Computational Complexities

The computational complexity of recommender engines can be divided into two stages: In the data preprocessing step, data are stored in memory and a low-rank approximation of the preference matrix is calculated and stored, which takes time poly(mn), polynomial in the matrix dimensions, and space 𝒪(nk) to store the output of the top-k row singular vectors. In the machine learning serving step, when a user interacts with the system, the recommendation is delivered in time 𝒪(nk). This includes a projection of the row of the subsample matrix that corresponds to the user onto the top-k row singular vectors of the matrix to output a high value element.

Computational runtime growth with increasing system size n

Kerenidis and Prakash demand for their quantum algorithm that the time needed to write and read data in and our of memory is polylogarithmic in the matrix dimensions, and that the total memory required is linear (up to polylogarithmic terms) in the number of entries of the preference matrix. As it turns out, their quantum recommendation algorithm requires time polylogarithmic in the matrix dimensions mn and only polynomial in the rank k of the matrix: 𝒪(poly(k) polylog(mn) ).

In their quantum recommender engine, the data is assumed to be stored classically and the quantum algorithm can create superpositions of rows of the subsample matrix. Some additional information about the matrix is required which leads to a memory that is linear in the number of entries in the subsample matrix (up to poly-logarithmic terms). Furthermore, the time it takes to input the data into a matrix increases much slower than the size of the matrix itself (data entry time is poly-logarithmic in the matrix dimensions), which is very efficient. Quantum superpositions are created in poly-logarithmic time by a quantum algorithm with access to these data.

In summary, given at least (1−ξ)(1−δ−ζ)m users in the subset S′, for producing good recommendations with high probability, the runtime speedup for their quantum algorithm is: 𝒪(poly(k) polylog(mn) ). This recommender engine was exponentially smaller than the classical time (prior to dequantization!) if the rank k is a small constant and the matrix dimensions are of the same order.

polylog(mn) is the time to sample from a row of the projected matrix. This means that the quantum algorithm requires time polylogarithmic in the matrix dimensions as described by polylog(mn).
k is the rank. This means that the quantum algorithm requires only polynomial time in the rank of the matrix as described by poly(k).
The sampling is conducted not from the original matrix. This would slowdown the process. It suffices to take samples from an approximate reconstruction of the hidden preference matrix A, with an element A_ij that takes values 0 or 1 to indicate whether product j is a good recommendation for user i.
The parameter δ represents a bound on the closeness of the original matrices A and its approximated matrix Â in terms of their Frobenius norm. For matrices A and Â, the Frobenius norm measures the square root of the sum of the absolute squares of its elements. If ||T-T*||_F is small, then A and Â are close to each other in terms of their entries.

Dequantization Algorithm

The assumption that are made for quantum state preparation are the entry point for their dequantization. An input vector v from a dataset can be quickly embedded into a corresponding quantum state |v⟩ using quantum encoding methods. One could now either prepare a state from an arbitrary input vector, where the computational cost of state preparation is reduced by running the algorithms multiple times, or providing specified input vectors that faciliates quantum state preparation.

Even in the quantum setting it is crucial to have a data structure in place that allows a fast preparation of quantum states to achieve query complexity polylogarithmic in input size. When querying entries of a vector in quantum superposition, the quantum state preparation still takes Ω(√n) time corresponding to arbitrary length-n input vectors.

Ewin Tang noticed that the specific data structure that was required to satisfy assumption on the quantum state preparation can also satisfy assumptions on 𝓁²-norm sampling. A classical algorithm that exploits these assumptions can match the performance of a quantum algorithm. This insight was both groundbreaking and straightforward.

What is 𝓁²-norm sampling? Given a function f(x) is sampled at n discrete points x₁, x₂, …, xₙ. The goal is to find an approximation of the function that minimizes the 𝓁²-norm of the error. Minimizing the 𝓁²-norm of the error means finding an approximation f̂ such that the 𝓁²-norm of the difference between f and f̂ is as small as possible:

Even though it was already known that 𝓁²-norm sampling is very effective, allowing singular value estimation in time independent of m and n, as well as the power of randomized linear algebra when sampling from projections of a vector onto a subspace, Ewin Tang put combined both in a clever way.

The data structure supports storing a vector v ∈ ℝ^n with w nonzero entries in 𝒪(w log(n)) space. Reading and updating an entry of v in 𝒪(log n) time. Finding ||v||² in 𝒪(1) time. Sampling from 𝔇_v in O(log n) time.

Binary search tree (BST) data structure for v ∈ ℝ

Whereas in the quantum recommender algorithm the swap test is applied to estimate the overlap between a use and an item state, in the classical analogue 𝓁²-norm sampling is applied. In the dequantized (classical) version we have the Hierarchical Vector Representation, where a bifurcated graph is used to organize user preferences and item features into high-dimensional vectors. The root node represents the initial vector setup, that branches into paths representing different features or dimensions of the vectors. 𝓁²-norm sampling is used on these vectors to approximate inner products, where sampling probabilities are proportional to the square of vector components. The inner product between user and movie vectors is approximated by aggregating the 𝓁²-norm sampled values. The hierarchical structure helps to efficiently access and combine vector components.

This means that in the quantum approach, user preferences and item features are encoded into quantum states, the swap test is used to compute the overlap between user and item states, and the resulting overlap values are used to recommend items. In the dequantized approach, user preferences and item features are represented as classical vectors, 𝓁²-norm sampling is used to approximate the inner product between user and item vectors, and the resulting approximated inner product values are used to recommend items. The swap test in qRAM and 𝓁²-norm sampling in Tang’s dequantization serve similar purposes in their respective quantum and classical frameworks: they both enable efficient estimation of similarities (inner products) between high-dimensional data representations.

The need for superposition over data was effectively replaced by using randomized probability distributions and subsamples of larger pieces of data to read out information sublinear in time as opposed to linear. This cannot be circumvented by a different data structure for state preparation without sampling. It leads to Ewin Tang’s advice to always compare QML algorithms with state preparation to classical algorithms with sampling:

“When QML algorithms are compared to classical ML algorithms in the context of finding speedups, any state preparation assumptions in the QML model should be matched with 𝓁²-norm sampling assumptions in the classical ML model.”

Randomized sampling methods create statistical samples of a subset of the data that efficiently approximate large datasets. This significantly reduces the computational resources needed for the recommendation system problem. For example one can estimate eigenvectors not by working with the whole matrix but with compressed summaries or sketches.

If Random (Numerical) Linear Algebra is new to you, I recommend this video. It provides an excellent overview and the motivation behind. Also, study this paper if you want to learn this method more in detail.

Introduction to randomized (numerical) linear algebra

In her dequantization approach, Ewin Tang proposed simple routines to manipulate 𝓁²-norm sampling distributions, which play the role of quantum superpositions in the classical setting. Her approached narrowed the gap between quantum and classical recommenders by only a small polynomial overhead, de facto dequantizing the quantum algorithm.

Step 1: Examine matrix for low-rank and low condition number

In dequantization, the original matrix A, which is an user-item interaction matrix in a recommender system, is examined for certain data properties amenable for dequantization, such as low-rank and low condition number.

Step 2: Apply matrix sampling using random projections

Just like the quantum recommender algorithm, the dequantization method avoids matrix reconstruction to low-rank approximation of the preference matrix (because this exhibits polynomial complexity) and instead applies matrix sampling. And the quantum measurement to obtain a recommendation sample is replaced with an importance 𝓁²-norm sampling developed by Frieze, Kannan and Vempala on the BST data structure for finding low-rank approximation of the preference matrix. It states that an entry x_i can be sampled with probability:

The sampling uses a random projection operator S that is applied to matrix A to perform matrix sketching. This method involves sampling rows and columns of the matrix with probabilities proportional to their squared 𝓁²-norms. The operator S reduces the dimensionality while preserving the geometric distances between points using 𝓁²-norm sampling based on the Johnson-Lindenstrauss Lemma. The result is a subspace embedding into a smaller, projected matrix SA that reduces the computational complexity.

Exkurs : The Johnson-Lindenstrauss dimensionality reduction is a very fast compression algorithm that improved the Time complexity from 𝒪(d) to 𝒪(log n) per vector pair, the Space complexity from 𝒪(nd) to 𝒪(n log n) and the Communication complexity from 𝒪(d) to 𝒪(log n) per vector.

Step 3: Low-rank approximation on subspace embedding

Low-rank approximation is then performed by truncated singular value decomposition on the sampled matrix, where k is the target rank. This retains the most significant singular values and vectors, which preserves the essential structure. Frieze, Kannan and Vempala developed the ModFKV algorithm to handle low-rank-k approximations efficiently, which Ewin Tang modified to find a rank-k approximation of a matrix in a way that it includes additional error parameters to bound the approximation of the low-rank closer to the quantum algorithm from Kerenidis and Prakash.

ModFKV subsamples the input matrix, computes the large singular vectors and values of the subsample, and outputs an approximation of the original matrix, which is a good description of the singular vectors of the full matrix. It helps in finding low-rank approximations without processing entire high-dimensional data.

Complexity Bounds

Ewin Tang used various inequalities to provide complexity bounds, such as McDiarmid’s inequality to prove concentration, Bernstein Inequality to provides a bound on the deviation of random matrices during sketching, and the triangle inequality to bound the quality of the final approximation :

Triangle inequality for quality bounds of approximations (Source page 118)

Using dequantization, Ewin Tang was able to reduce time and space complexity with low-rank approximation for a matrix A ∈ ℝ of size m × n:

Computational Complexity of Dequantization (Source)

This means that given a m × n matrix that allows certain 𝓁²-norm sampling operations, outputs an 𝓁²-norm sample from a rank-k approximation of that matrix in time 𝒪(poly(k) log(mn)), which is only polynomially slower than the quantum recommender algorithm. Interestingly, this new classical approach is even exponentially faster than previous classical recommender system (which run in time linear in m and n), but under strong assumptions about the input data.

If you want to go more into detail you can watch one of her first talks on how quantum-inspired classical linear algebra algorithms work. This is based on the algorithm for low-rank matrix inversion.

Are now all algorithms dequantizable?

Ewin Tang’s breakthrough results sparked an avalanche of research into dequantization, the process of finding classical approximations for quantum algorithms. For many important techniques, researchers successfully developed efficient classical counterparts with only a minor polynomial overhead in computational resources. Many other quantum machine learning algorithms have then been dequantized, such as principal component analysis and supervised clustering, low-rank stochastic regression and sublinear classical algorithms for solving low-rank linear systems.

In later work, the quantum-inspired algorithms for solving linear systems could be made even faster. For example, if we have SQ access to a vector b and an s-sparse matrix A, there is a classical randomized algorithm that returns x̃ with probability at least 0.99 (and ɛ ≤ 0.01) in time 𝒪 (sκ² log(1/ɛ)) :

However, some quantum algorithms appear to resist dequantization. Quantum topological data analysis (QTDA) stands out as a prime example of a field where classical approximations remain elusive. This is because computing topological invariants, such as Betti numbers, is challenging to approximate efficiently using classical algorithms. For example, deciding whether a combinatorial Laplacian has a trivial or non-trivial kernel (i.e., zero or non-zero Betti number) is QMA1-hard, which suggests that this problem is likely intractable even for quantum computers in worst-case.

Ewin Tang, Quantum Machine Learning Without Any Quantum

Ewin Tang showed that quantum machine learning algorithms can be grouped into two classes: the first one, denoted as ‘qRAM-based QSVT’ are built upon quantum state preparation assumptions, where their matrices exhibit low-rank. These are amenable to dequantization with sketching methods like 𝓁²-sampling. On the other side are quantum machine learning algorithms that are built upon sparse matrices. These either provide an exponential speedup, or quantum speedups do not exist at all, which in computational complexity theory is class BQP-complete.

Quantum computers still exhibit an exponential speedup over all known classical algorithms for sparse, full-rank matrix problems, including the quantum Fourier transform, eigenvector and eigenvalue analysis, linear systems, and others. (Quantum-inspired algorithms in practice)

Key Takeaways

Quantum machine learning is still an emerging field, marked by rapid advancements. Much remains to be explored, and unexpected new and groundbreaking research insights continue to surface.
Dequantization is possible if data exhibit certain properties and special algorithmic methods are applied. The class of quantum algorithm that uses state preparation assumptions implicitely require having matrices of low rank. These algorithms must be significantly more efficient than classical 𝓁²-norm sampling assumptions. Otherwise fast classical sampling methods are powerful enough to dequantize.
Not all quantum machine learning algorithms are dequantizable. What data properties makes them less amenable for dequantization in which settings remains subject to ongoing research. Algorithms, such as quantum topological data analysis (qTDA), where the input data are sparse, seem to stay quantum-safe. This is because this class is BQP-complete. This means that they provide an exponential speedup, or exponential quantum speedups don’t exist at all.

References:

A quantum-inspired algorithm for recommendation systems from Ewin Tang
Quantum Machine Learning Without Any Quantum from Ewin Tang
Dequantizing algorithms to understand quantum advantage in machine learning from Ewin Tang

Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect those of my employer.

👉 Follow me: LinkedIn, Google Scholar and www.deltorobarba.com