Quantum Kitchen Sinks: An algorithm for machine learning on near-term quantum computers

by Christopher M. Wilson, Johannes S. Otterbach, Nikolas Tezak, Robert S. Smith, Gavin E. Crooks, Marcus P. da Silva

Over the last several years there has been a lot of excitement around the experimental progress of nascent quantum computers. Larger and larger devices are being fabricated, opening the door to the investigation of what interesting things may be done with these noisy, intermediate-scale quantum (NISQ) devices. Being both relatively small and noisy, these devices cannot benefit from quantum error corrections. In order to tackle interesting problems, algorithms for the NISQ-era are known as hybrid algorithms: algorithms that rely on relatively small quantum circuits running in conjunction with some classical computation.

Our latest result is a new approach to hybrid algorithms for NISQ devices, and one that is applicable to a rather broad class of machine learning problems known as supervised machine learning. This is quite exciting because we can show it works well for real-world datasets even when using very small circuits on as few as 2 or 4 qubits. But before we delve in the details of our result, allow us to give some background on the problem we considered, and the roots to our approach.

Supervised Machine Learning and Classification

The standard example of supervised learning is classification — we want a machine that can distinguish between two classes of things, say a handwritten “3” or “5.” Before we can use the machine, though, we need to train it by providing examples of what we want to classify, along with labels telling it the right answer for the training data. We can then test the machine by feeding it new examples and asking it to classify those images, then seeing how often it gets the answer right.

State-of-the-art classification algorithms, such as deep neural nets, have tens of millions of free parameters that need to be numerically optimized. They are very expensive in terms of the memory space and computational power needed to train them. This is not suitable for NISQ-era quantum computers. We looked for ways to make our machines smaller and more computationally efficient.

We found inspiration in the work of Ali Rahimi and Ben Recht, using an approach called random kitchen sinks. Their key insight was that randomized machines can often learn just as well as optimized machines, while being smaller and requiring much less computing power to train. In fact, recent results indicate that this approach can perform as well as deep neural networks on some tasks.

This insight goes back to the early days of neural networks. The Mark 1 Perceptron was an analog computer in the 1960s built for machine vision problems. Its input was a 20x20 grid of optical sensors. It also had a layer of 512 nonlinear “associator” circuits and 8 output “response” units. The learning was done by adjusting motor-driven potentiometers that connected the associators and response units. This was slow and expensive.

The Mark 1 Perceptron. Photo courtesy of the Cornell University Library.

To make up for the small number of learned connections, they had a random, static network of 16,000 connections between the sensory and associator units. The Perceptron model shows us that complexity and nonlinearity are important, but a large part of this complexity and nonlinearity can be random. This hybrid model of complex randomness and simple learning are what inspire quantum kitchen sinks.

Quantum Kitchen Sinks

Instead of random connections, our algorithm uses random circuits. These circuits have free parameters (rotation angles for quantum states), and we encode data in angles by mixing the inputs with random numbers. Instead of having one big, complex circuit, we can use simple circuits that we repeat many times with different random parameters — but these circuits can have as many qubits as we’d like, and as many free parameters as we allow. We stack the measured bits obtained for each circuit into a large vector and feed this into a very simple classification algorithm on a classical computer.

At this point a subtlety arises: if we are feeding the output of our quantum computer into a ML algorithm on a classical computer, how do we know the quantum computer is doing anything? We address this by focusing only on very simple ML methods on the classical computer (only linear functions), and by comparing the performance between the presence and absence of the quantum computer in the algorithm. In other words, we compare the hybrid algorithm against a linear baseline, and focus on the “lift” in performance that is enabled by the quantum computer.

To test our approach, we started by focusing on a rather simple artificial dataset, where the different classes correspond to nested picture frames. These classes cannot be separated by a straight line, so they are very challenging for the linear ML methods we allow the classical computer to have — the classical algorithm will essentially just guess at random between the two possible labels. However, when we train a classifier with the quantum kitchen sinks algorithm using just two qubits, we are able to get essentially perfect classification of the data. In other words, we can show that even a very small quantum computer can give a very large performance boost — from completely random classification to essentially perfect classification.

A synthetic “picture frames” dataset.

Next we considered a more interesting dataset: handwritten digits from the MNIST database. This dataset is often used as a benchmark for ML algorithms, and consists of grey-scale arrays of 28x28 pixels (a relatively high dimensional dataset, especially compared to the 2-dimensional example above). We chose to focus on the subset of the database that was most difficult for simple ML methods to distinguish correctly — the handwritten digits “3” and “5.” This choice makes it easier show a lift due to the quantum kitchen sinks. If we chose instead the “0” and “1” digits, we’d find that even without a quantum computer the very simple linear classical algorithms have nearly perfect accuracy, so it is quite difficult to show that a quantum algorithm adds to the performance.

A sample MNIST dataset.

While the linear classical algorithms can distinguish “3” and “5” with ~96.8% accuracy, adding quantum kitchen sinks boosts this to ~98.6%, well beyond the statistical uncertainty associated with the estimates. Most notably, this is achieved with only 4 qubits, and a very shallow circuit — something that can easily be done on devices available today — and we have evidence this is close to optimal for the circuit ansatz we consider.

It should be emphasized that these numbers do not come cheaply — random kitchen sinks work best with large training datasets, and all evidence points to a similar behavior for quantum kitchen sinks. Moreover, there is much to be learned about how to select these parameterized circuits, how to choose the random numbers to mix with the input data, and how to optimize the number of qubits to get good performance. The question of whether there is a regime where this approach can outperform the best classical algorithms is certainly an interesting one, and also a question that we feel we are quite far from answering. That being said, we do find that these examples are a strong indication of the promise of NISQ-era devices, and we look forward to seeing how other members of the community may use this approach to solve interesting problems.


Read the full paper: 
Quantum Kitchen Sinks: An algorithm for machine learning on near-term quantum computers. https://arxiv.org/abs/1806.08321