How and when quantum computers will improve machine learning?
The different strategies toward quantum machine learning
There is a strong hope (and hype) that Quantum Computers will help machine learning in many ways. Research in Quantum Machine Learning (QML) is a very active domain, and many small and noisy quantum computers are now available. Different approaches exist, for both long term and short term, and we may wonder what are their respective hopes and limitations, both in theory and in practice?
1/ The Long Term Goal: Algorithmic QML
It all started in 2009 with the publications of the “HHL” Algorithm  proving an exponential acceleration for matrix multiplication and inversion, which triggered exciting applications in all linear algebra-based science, hence machine learning. Since, many algorithms were proposed to speed up tasks such as classification , dimensionality reduction , clustering , recommendation system , neural networks , kernel methods , SVM , reinforcement learning , and more generally optimization .
These algorithms are what I call Long Term or Algorithmic QML. They are usually carefully detailed, with guarantees that are proven as mathematical theorems. We can (theoretically) know the amount of speedup compared to the classical algorithms they reproduce, which are often polynomial or even exponential, with respect to the number of input data for most of the cases. They come with precise bounds on the result’s probability, randomness, and accuracy, as usual in computer science research.
While they constitute theoretical proof that a universal and fault-tolerant quantum computer would provide impressive benefits in ML, early warnings  showed that some underlying assumptions were very constraining.
These algorithms often require loading the data with a Quantum Random Access Memory, or QRAM , a bottleneck part without which exponential speedups are much more complex to obtain. Besides, they sometimes need long quantum circuits and many logical qubits (which, due to error correction, are themselves composed of many more physical qubits), that might not be arriving soon enough.
When exactly? When we will reach the Universal Fault-Tolerant Quantum Computer, predicted by Google in 2029, or by IonQ in only 5 years. More conservative opinion claim this will not happen before 20+ years, and some even say we will never reach that point. Future will tell!
More recently, a mini earthquake amplified by scientific media has cast doubt on the efficiency of Algorithm QML: the so-called “dequantization” papers  that introduced classical algorithms inspired from the quantum ones to obtain similar exponential speedups, in the field of QML at least. This impressive result was then hindered by the fact that the equivalent speedup only concerns the number of data, and comes at a cost of a terrible polynomial slowdown with respect to other parameters for now. This makes these “quantum-inspired” classical algorithms currently unusable in practice .
2/ The Short Term Approach: Variational QML
In the meantime, something very exciting happened: actual quantum computers were built and became accessible. You can play with noisy devices made of 5 to 20 qubits, and soon more. Quite recently Google performed a quantum circuit with 53 qubits , the first that could not be efficiently simulable by a classical computer.
Researchers have then been looking at new models that these noisy intermediate scale quantum computers (NISQ) could actually perform . They are all based on the same idea of variational quantum circuits (VQC), inspired by classical machine learning.
- One defines a small circuit, the “ansatz”, made of many gates with tunable parameters, such as the angle of a rotation gate.
- Then, measurements of the resulting quantum state are performed and should give the right answers to the desired task (classification, regression). At first, the results are bad because the parameters are almost random. This metric is called the Objective Function or the Loss.
- Optimization is done on a classical computer to propose a new and hopefully better set of parameters to try. And we repeat this loop until the circuit gives good results.
The main difference with algorithmic QML is that the circuit is not implementing a known classical ML algorithm. One would simply hope that the chosen circuit will converge to successfully classify data or predict values. For now, there are several types of circuits in the literature  and we start to see interesting patterns in the success. The problem itself is often encoded in the loss function we try to decrease: we sum the error made compared to the true values or labels, or compared to the quantum states we aim for, or to the energy levels, and so on, depending on the task. Active research tries to understand why some circuits work better than others on certain tasks, and why quantumness would help.
In recent years, researchers have tried to find use cases where Variational QML would succeed at classical problems, or even outperforms the classical solutions [21, 22]. Some hope that the variational nature of the training confers some resilience to hardware noise. If this happens to be the case, it would be beneficial not to wait for Error Correction models that require many qubits. One would only need Error Mitigation techniques to post-process the measurements.
On the theoretical side, researchers hope that quantum superposition and entangling quantum gates would project data in a much bigger space (the Hilbert Space of n qubits has dimension 2^n) where some classically inaccessible correlations or separations can be done. Said differently, some believe that the quantum model will be more “expressive”.
It is important to notice that research on Variational QML is less focused on proving computational speedups. The main interest is to reach a more expressive or complex state of information processing. The two approaches are related but they represent two different strategies. Unfortunately, less is proven compared to Algorithmic QML, and we are far from understanding the theoretical reasons that would prove the advantage of these quantum computations.
Of course, due to the limitations of the current quantum devices, experiments are often made on a small number of qubits (4 qubits in the above graph) or on simulators, often ideal or limited to 30+ qubits. It is hard to predict what will happen when the number of qubits will grow.
Despite the excitement, VQC also suffers from theoretical disturbance. It is proven that when the number of qubits or the number of gates becomes too big, the optimization landscape will be flat and hinder the ability to optimize the circuit. Many efforts are made to circumvent this issue, called “Barren Plateaus” , by using specific circuits  or smart initialization of the parameters .
But Barren Plateaus are not the only caveat. In many optimization methods, one must compute the gradient of a cost function with respect to each parameter. Said differently, we want to know how much the model is improved when I modify each parameter. In classical neural networks, computing the gradients is usually done using backpropagation because we analytically understand the operations. With VQC, operations become too complex, and we cannot access intermediate quantum states (without measuring and therefore destroying them).
The current state-of-the-art solution is called the parameter shift rule [27, 28] and requires to apply the circuit and measure its result 2 times for each parameter. By comparison, in classical deep learning, the network is applied just once forward and once backward to obtain all thousand or millions gradients. Hopefully, we could parallelize the parameter shift rule on many simulators or quantum devices, but this could be limited for a large number of parameters.
Finally, researchers tend to focus more and more on the importance of data loading into a quantum state , also called feature map . Without the ideal amplitude encoding obtained with the QRAM, there are doubts that we will be able to load and process high dimensional classical data with an exponential or high polynomial factor. Some hope remains on data independent tasks such as generative models [21, 31] or solving partial differential equations.
Note that the expression “Quantum Neural Networks” has been used to show the similarities with classical Neural Networks (NN) training. However they are not equivalent, since the VQC don’t have the same hidden layers architecture, and neither have natural non linearities, unless a measurement is performed. And there’s no simple rule to convert any NN to a VQC or vice versa. Some now prefer to compare VQC to Kernel Methods .
3/ New Paths
We now have a better understanding of the advantages and weaknesses of the two main strategies towards quantum machine learning. Current research is now focused on two aspects:
- Understanding more and harnessing variational quantum circuits, sometimes at the cost of introducing more complex circuits. Improving gradients computation, and ways to apply non linearities with adaptative measurements insides the VQC.
- Decomplexify long term algorithms so they would fit in the NISQ era, sometimes at the cost of reducing their proven speedups. Focus on data independent problems.
Finally, and most importantly, improve the quantum devices! We all hope for constant incremental improvements or a paradigm shift in the quality of the qubits, their number, the error correction process, to reach powerful enough machines. Please physicists, can you hurry?
PS: let’s not forget to use all this amazing science to do good things that will benefit everyone.
Jonas Landman is a Ph.D. student at the University of Paris under the supervision of Prof. Iordanis Kerenidis.
He is Technical Advisor at QC Ware and member of QuantX. He has previously studied at Ecole Polytechnique and UC Berkeley.