Building the world’s smallest quantum classifier
Written together with Mark Fingerhuth
How can we use quantum computing to solve problems in machine learning? Researchers — most of them physicists — increasingly started asking this question in the past couple of years. Some first answers approached the problem in the way things are typically done in quantum computing: Take a classical machine learning algorithm and find a quantum algorithm that computes the same solution faster. (And, for those not familiar with computational complexity, “faster” in this context does not mean “absolutely faster”, but with better performance for growing problem sizes, such as more data.)
In our paper “Implementing a distance-based classifier with an interference circuit” (Schuld, Fingerhuth and Petruccione 2017), which was awarded a runner-up placing at the IBM Q Best Paper Award, we turned this approach around. Can we come up with a simple machine learning model, a so-called classifier, that can be implemented by a small quantum circuit? In a nutshell, classifiers are models, functions or algorithms that deduce from sample data points {(x, y)} of inputs x and class labels y to guess y for unseen inputs x.
Idea
The quantum classification circuit we came up with is indeed very short: It consists of a single Hadamard gate and two single qubit measurements acting on an ancilla and a designated class qubit. (One of the measurements involves post-selection, but with reasonable success rates in practice.)
Data preparation
For this classifier to do something useful, we need to first encode the data into the amplitudes of a quantum state by a circuit U_data. In fact, one needs to first normalise the data such that it can be represented as a vector on a high-dimensional Bloch sphere. Quantum routines to encode data in amplitudes, so called arbitrary state preparation routines, are known to do this with a runtime that is linear in the data size, and this is arguably the best our algorithm can do in terms of runtime, since the data is the input to the problem.
The classifier
Once the data has been prepared, the single-gate-circuit can implement a very specific classifier which computes a weighted average of the class labels of all training inputs. The weight is higher the closer the input is to the new input we want to classify.
In the simplest version “closeness” is measured by the squared Euclidean distance, but different state preparation circuits U_data can change the closeness measure. This is similar to the kernel trick in classical machine learning.
Implementation
We wanted to run the example on the IBM Quantum Experience, which was at the time restricted to a 5-qubit machine. This turned out to be not as easy as we thought, since even for a 2-d example the state preparation circuit, which in principle consists of a handful of gates, was easily swallowed by noise. What Mark came up with was a hand-crafted circuit for two selected input vectors to do a proof-of-principle experiment.
Even with this carefully crafted algorithm, the circuit theoretically prepared the states with low precision, in addition to the hardware errors. As you can see in the table to the left, the exact values (asterisk) and simulated results with the low-precision state preparation (triangle) where still rather far away from the results generated by the hardware (top number in each cell).
All in all, it seems that even for the world’s smallest classifier we need a little more time until we can do “hardware numerics”. On the upside, things are developing fast, so stay tuned!
If you want to dive a little bit deeper into the technical details, maybe you want to try this example:
Example
Say you have a sample x¹=(0,-1) of class y¹=0 and a sample x²=(-0.789, -0.615) of class y²=1, and you want to know which of the two classes a new data sample x=(-0.948, 0.318) belongs to.
Referring to the paper for details, the state to prepare with U_data is proportional to this:
As you might see, two copies of the new input got encoded into the upper half of the superposition marked by the ancilla being 0. The training data is encoded into the lower half, and the class qubit entangled with each data point is set to the class the data point belongs to. The two qubits in the middle register simply index the entries of the data vectors.
Now, the Hadamard gate interferes amplitudes. In our encoding this means it computes the sum and difference of data points.
The postselective measurement “kills” half of the amplitudes.
The probability of the classifier to predict 1 is defined as the probability to measure the class qubit in state |1⟩, which is proportional to the sum of the two amplitudes. Due to the normalisation of the data, this weighs the training points according to their squared distance, which means that close training points of class 1 will contribute strongly to the probability of measuring |1⟩, while points that are further away can only contribute weakly.