Reasoning under uncertainty with a near-term quantum computer

Matthias Rosenkranz
Cambridge Quantum
Published in
8 min readMar 29, 2021
Photo by Jon Ly on Unsplash

By Matthias Rosenkranz and Mattia Fiorentini

Reasoning comes naturally to humans.

For example, let’s say you step outside on a warm summer day and slip. The grass is wet, the ground is muddy and you wonder: was it rain or did the neighbour’s sprinkler soak the garden?

Often our decisions are based on such reasoning: if you surmise it has been raining, chances are you will go back inside to find an umbrella in case more rain is on its way.

While reasoning is intuitive for humans, it’s extremely difficult for today’s classical computers to emulate. Of course, computers can help us in some reasoning tasks: they can support doctors in medical diagnoses or finance professionals in managing investments, for example. However, such reasoning models are demanding: they require time consuming computations, a large amount of data, and ultimately, a considerable amount of electricity. Often, they also cannot offer simple explanations for their answers and struggle when asked how confident they are on certain possible outcomes. These issues become particularly severe when a decision is the result of multiple yes/no outcomes.

New research from Cambridge Quantum Computing demonstrates that quantum computers can learn to reason. In fact, one could say quantum computers are naturally predisposed to reason under uncertainty — a scenario typically encountered by us humans in our day-to-day lives.

This demonstration is detailed in our scientific paper published on the pre-print repository arXiv: Variational inference with a quantum computer.

We envision that machine learning scientists as well as quantum software and hardware developers will likely be the groups most interested in this new line of research. The first group may find these new tools useful for tackling ambitious problems such as studying reasoning and cause-effect in complex systems. The second group may apply our work to uncover new applications of near-term quantum computers in science, engineering and business — and even opportunities for a quantum advantage.

A model for reasoning

Reasoning means inferring conclusions from premises and available information. You notice that the grass is wet. Intuitively you infer that rain must have fallen or that a sprinkler must have soaked it or both. This inference is based on your premises, i.e., your understanding of “how the world works”.

Premises can also encode expert knowledge. In medical diagnosis, inference is based on an expert knowledge of the relationship between risk factors, diseases and symptoms. Such relationships can be represented by a graph.

A directed graph of four nodes called Cloudy, Sprinkler, Rain and “Grass wet”. Cloudy connects to Sprinkler and Rain, Sprinkler and Rain connects to “Grass wet”.
Textbook Bayesian network

Let us model our understanding of the wet grass situation from above as a graph. The nodes in this graph are the variables of the system, namely, “Grass wet”, “Sprinkler”, “Rain” and “Cloudy”. These variables have been observed (such as the state of the grass) or remain unobserved or hidden (e.g., the state of the sprinkler). Links represent an associative relationship between connected variables (e.g., the grass being wet depends on rain or no rain). The links between, e.g., “Cloudy”, “Sprinkler” and “Rain” encode our idea that a cloudy sky influences the likelihood of rain and the sprinkler activity. The absence of a direct link between “Cloudy” and “Grass wet” means we do not think a cloudy sky directly moistens the grass (but the influence can be indirect, e.g., through rain). If we assign conditional probabilities to all nodes, we call the graph a Bayesian network. In fact, our graph above is the textbook example of a Bayesian network.

With a fully specified Bayesian network, we can ask questions such as, “What is the likelihood that the grass is wet given that it rained and the sprinkler was off?” Inference is the inverse of this question. It asks questions such as, “What is the likeliest state of the Sprinkler, Rain and Cloudy that explains my observation of wet grass?” One potential answer to such an inferential question could be “The sprinkler was off, the sky was cloudy and it had rained.”

In the real-world, reasoning is uncertain

Reasoning is uncertain because information is often incomplete, corrupted or because our understanding of the system is incomplete or even incorrect. Probabilistic inference is a principled way of incorporating uncertainty in computer-aided reasoning. Reasoning under uncertainty means reasoning about many potential configurations of the unobserved variables given available information.

Bayesian networks have become invaluable tools to solve this task. Sometimes it suffices to find the single most likely “explanation” of the observed data. In many other cases, it is equally important to quantify the uncertainty of a certain solution. Another task for reasoning under uncertainty could be predicting future outcomes and your confidence in these predictions based on past observations. In these cases, we need to assess the probabilities of many potential outcomes of unobserved data. These probabilities are called the posterior probability distribution. Estimating the posterior distribution is essential for reasoning under uncertainty.

Exact probabilistic inference in complex systems is all but impossible using traditional statistical methods. Probabilistic machine learning and artificial intelligence offers tools to help us scale to larger systems.

Modern variational inference is such a toolbox. By approximating the posterior distribution, using stochastic optimization and other tricks, this toolbox has scaled inference to massive datasets. Yet even those methods can only go so far because the approximations to the posterior distribution may simply not be expressive enough for a given task. This has motivated classical machine learning researchers to find more expressive approximate posterior distributions, e.g., normalizing flows. In addition, standard variational inference may fail when unobserved variables are discrete (such as sprinkler on/off or rain/no rain).

Quantum computers are natural probabilistic machines

Outputs from a quantum computer appear random. However, we can program quantum computers to output random sequences with certain patterns. These patterns are discrete and can become so complex that classical computers cannot compute them in reasonable time. This is why quantum computers are natural tools for probabilistic machine learning tasks such as reasoning under uncertainty. Doubly so if we need to reason about a growing number of discrete variables.

Can we just run standard variational inference on quantum computers? Unfortunately, it is not that simple. These classical methods rely on knowing the equations governing the random patterns of the unknown variables at least approximately. It is not possible to write those down for the more complex patterns quantum computers can produce.

New algorithms for variational inference with quantum computers

We have released two new algorithms, which allow efficient variational inference with quantum computers. These variational quantum algorithms overcome the problems mentioned above. They directly optimise the output patterns of the quantum computer until they match closely the patterns that would be generated by the true posterior distribution of the unobserved variables. Effectively, the quantum computer outputs potential solutions to inferential questions such as, “What is the state of the unobserved variables given observed data?” These outputs can then be used in downstream tasks such as finding the likeliest reason given the available data or predicting future outcomes and their confidence.

Implementation on quantum devices

We did not stop at theoretical results. To show the effectiveness of our methods we also implemented three proofs of principle on simulators and on an IBM Q quantum computer.

1.Textbook “cloud-sprinkler-rain” Bayesian network

First, we employed both our algorithms for inference on random instances of the textbook “cloudy-sprinkler-rain” Bayesian network from above. Each instance assigns random conditional probabilities to all nodes. In essence, both methods worked “out-of-the-box” without much fine-tuning.

2. Hidden Markov models

Histogram of the true posterior and Born machine posterior truncated after the 10 most probable latent configurations. The modes of the two distributions match
Truncated histogram of the posterior distribution for a hidden Markov model

Next, we inferred market regime switches in a hidden Markov model of simulated financial time series. This task assumes that an observed time series of market returns (given in green after “Data” in figure on the left) is influenced by a latent variable describing the market regime (its time series is indicated by black lines after “Latent”). For example, the market regime could be a bull or a bear market. The returns might show higher levels of volatility in a bull market compared to a bear market. The histogram above compares the approximate posterior (blue bars), which is the output of our method, and the true posterior distribution (magenta bars). For this comparison, we choose a small system size so that the true posterior is still computable. Overall, our method showed good agreement with the exact results.

3. Medical diagnosis in the “lung cancer” Bayesian network

Finally, we explored a textbook example for inferring likely sets of diseases in patients given partially available information about symptoms and risk factors. For this example, we used 3 observed symptoms and inferred the configuration of 5 unobserved diseases and risk factors. The histogram below compares the true (magenta bars) and approximate posteriors of our algorithm executed on a quantum computing simulator (blue bars) and a real IBM Q quantum computer (grey bars). This shows that our methods work on real quantum hardware. Noise on the device leads to slower convergence compared to the simulation.

Histogram of the true, simulated Born machine and hardware Born machine posterior distributions. It shows very good agreement between simulated and true posteriors, while the hardware result is not fully converged yet. An inset shows the linear topology of the ibmq_rome quantum device used in the experiment.
Histogram of the posterior distribution for a medical diagnosis task

What kind of reasoning will quantum computers be most useful for?

The truth is we do not have a definite answer yet. The fields of variational inference, probabilistic machine learning and quantum computing have just started to merge. What we can say is this: sampling from complex distributions is the most promising way towards a quantum advantage with today’s noisy quantum devices. Our inference methods are based on sampling and can incorporate domain expertise in a principled way. We envision a combination with other machine learning tasks in generative modelling or natural language processing for a meaningful impact.

As quantum hardware matures, we hope to answer this open question more definitely in the years ahead.

Epilogue — Technical details

We encourage you to read the details of our methods in our research article. Briefly, we use two recent machine learning techniques: adversarial training and the kernelized Stein discrepancy.

Illustration of the adversarial method. First, a classical probabilistic classifier is optimised on samples from the prior and Born machine. Then its output is used to optimise the Born machine by bringing its distribution closer to the true posterior. This process repeats until convergence.
Adversarial method

The adversarial method optimises a classical probabilistic classifier and a probabilistic quantum model, called Born machine, in tandem. The Born machine serves as an expressive family of approximate posterior distributions. It outputs bitstrings which represent configurations of the unobserved variables. The training step optimises the parameters of the classifier and Born machine until its output closely matches samples from the true posterior distribution.

Illustration of the kernelized Stein discrepancy method. A kernel map is designed such that samples from the true posterior evaluate to zero in the kernel space. This allows measuring and optimising the distance between the Born machine samples and true posterior samples in a kernel space.
Kernelized Stein discrepancy method

The second method minimizes the kernelized Stein discrepancy between the Born machine and the true posterior without adversarial training. It turns out that we can estimate this discrepancy using only samples from the Born machine. This is possible by designing a map from samples into a reproducing kernel Hilbert space. In this kernel space, expectations with respect to the true posterior vanish as indicated in the illustration on the left. A minimal kernelized Stein discrepancy means the outputs of the Born machine mimic samples from the true posterior.

The adversarial method belongs to a class of methods optimising an f-divergence (here, Kullback-Leibler), the Stein method belongs to a class optimising a dual norm or integral probability metric.

--

--