Unpacking the Quantum Supremacy Benchmark with Python

M. Sohaib Alam
10 min readNov 11, 2019

By M. Sohaib Alam and Will Zeng

We recently had an exciting moment in the history of (quantum) computing. A team of collaborators led by Google published a result in Nature showing a benchmark against which their 53-qubit quantum processor significantly outperformed Summit — the world’s largest supercomputer. While valuable business applications are still some future milestones away, we like Scott Aaronson’s analogy:

“…much like the Wright Flyer in 1903, or Enrico Fermi’s nuclear chain reaction in 1942, [quantum supremacy] only needs to prove a point.”

The original publication is commendably readable and comes with 66 pages of detailed supplementary information.

In this blog post we’ll explore, through some short snippets of Python, the key aspects of their supremacy benchmark. We’ll show you how to run your own simulations of cross-entropy benchmark fidelity to measure if a data set of bitstrings was generated by a working quantum computer.

The Key Benchmark Idea: We have very good complexity-theoretic evidence to believe that bitstring outputs generated from random quantum programs are exponentially difficult for classical computers to reproduce.[1]

What is a quantum program? (quick review)

We can recommend a few different introductions to quantum programming depending on your background:

For computer scientists: A Practical Quantum Instruction Set Architecture

For programmers: Forest SDK Docs

For physicists: John Preskill’s notes

A quantum memory is made up of n qubits. The state of this memory is not described by a single bitstring — like it is on normal computers, nor is it described by a probability distribution over bitstrings like it is in probabilistic computing. Instead, it is described by a vector of complex numbers called amplitudes ⍺, one for each of the 2^n possible bitstrings.

where the notation |x> is the physicist way of writing a complex vector x. When a quantum memory is measured the outcome is a single bitstring with probability given by

Quantum programs operate on quantum memory and are described by unitary matrices. The operation of programs takes one state of quantum memory (a complex vector) to another state of quantum memory (another complex vector) by matrix multiplication. Thus for a quantum computer that starts with all qubits set to zero, the output of a quantum program U is described by

where the “cross-O” notation denotes a tensor product of |0> states. Unitary matrices are matrices of complex numbers whose conjugate transpose is its inverse, i.e.

This rule is needed to make sure that quantum program transformations obey the conservation of probability. You can think of unitary operations as norm-preserving transformations that act like rotations in the space of quantum states.

With that brief introduction behind us, we’ll shift our attention to writing code. Before we do, let’s import the following python libraries.

Now, let’s look at a quick code example that illustrates the basic concepts of quantum programs just introduced.

What is a random quantum program?

This definition of quantum programs gives us a natural way to think about random quantum programs. In theory, a random quantum program is a randomly selected unitary matrix. In particular it is a Haar-random unitary matrix. However, quantum programs on real quantum processors have some restrictions. Quantum processors are typically calibrated with a specific set of operations — such as one and two-qubit gates that are then composed into a quantum program. There are also often restrictions on which gates can be applied to which qubits. While we ignore such details in our simulations here, the Nature paper describes how the Google group builds up a random unitary out of patterns of randomly selected gates that obey the gate set and connectivity restrictions of their processor. For the purposes of our simulation, we can generate a random unitary matrix as follows:

What are the special quantum statistics?

Given some bitstring, a quantum program will specify the probability of sampling this bitstring. We can then ask what the probability of the probability of sampling this bitstring is, if we draw a quantum program at random. Because the double probability can be hard to read, you can think of this as: “How likely is it that any particular bitstring will occur with a particular probability?” This meta-probability, the probability distribution over the probability of drawing some given bitstring, is specified by the Porter-Thomas distribution, f(p) = N*exp(-N*p), where N=2^n_qubits. Let’s see how we can generate this distribution from random quantum circuits.

The function above picks some arbitrary bitstring, then generates a sample of probabilities for that bitstring resulting from drawing random quantum programs. We can use this function to generate such a sample for a specified number of qubits, and a number of random quantum programs over those qubits. We can then plot the resulting distribution of those probabilities, and compare it with the theoretical calculation, as shown below.

Running the code above, you should obtain a figure like the following.

Notice the vertical line at the value 1/dim, where dim = 2^n_qubits is the dimensionality of the n-qubit space. This line corresponds to the meta-distribution that we would obtain if the output bitstrings were not sampled from a quantum computer, but instead were sampled from a uniform distribution. The uniform distribution would mean that all bitstrings occur with probability 1/dim and so the histogram in that case just looks like a Dirac delta function.

Intuitively, what the figure above is saying is that if you fix some bitstring, e.g. 1011001…, and execute a random quantum program, it is much more likely that the resultant probability of sampling this bitstring is smaller than that given by the uniform distribution.

If instead of fixing a bitstring in advance, as we did above, we sample this bitstring according to the randomly sampled unitary itself, we generate an entirely different distribution from the Porter-Thomas distribution. This is the more relevant meta-probability distribution for the supremacy experiment, and can be generated by using the following code

to generate a similar plot,

to find a distribution that looks like the following.

How can we benchmark these statistics?

Having formed an intuition for random quantum programs, we now dive straight into the calculation of the linear cross-entropy benchmark fidelity (or F_XEB for short) used in the actual quantum supremacy experiment. This is the key statistic that the Google team measured to show supremacy. Essentially, it involves estimating an overlap between the experiment’s output distribution, and the ideal output distribution. More precisely, we calculate the average of the ideal probabilities of bitstrings generated from the experimental output, <P(x_i)>.

The diagram above sketches the work-flow of this calculation. First, we sample a random unitary. A black box algorithm/device takes the unitary as input, and generates samples. This black box may be a noisy version of the ideal quantum program, or some classical algorithm that generates samples of bit-strings. Using a (simulated) ideal quantum program that represents the sampled unitary, we calculate ideal probabilities of the generated samples. It is these ideal probabilities that we average over. In the case of the supremacy experiment, the black box denotes the (noisy) quantum processor, and the ideal program represents classical simulations estimating, for the ones they could, the ideal probabilities for the experiment’s random quantum circuits.

The precise quantity we seek to calculate is actually normalized to produce values between 0 and 1, and is given by D*<P(x_i)>-1, where D=2^n is the dimension of the n-qubit space, and <P(x_i)> is the average ideal probability of sampling bitstring x_i, as the code below shows.

Using the function defined above, we can compute F_XEB in the case where the black box is the ideal quantum program itself. For a sufficiently large number of qubits, this value should become very nearly equal to 1. For a smaller number of qubits, it is instead approximately equal to 2D/(D+1)-1. Note that this value asymptotes to 1 as D becomes very large (see [2] for more details).

In the above, we used values for trials and n_samples such that the output of the calculated F_XEB has reasonably low variance. We find that the choices given in the Nature paper yield slightly higher variance, but are also quicker to calculate through simulation.

On the other hand, we can also compute the same quantity for a black box that just uniformly samples over all bit-strings. We’ll find that F_XEB for such a sampler is roughly equal to 0.

In the above, we used a uniform distribution sampler (F_XEB~0) to compare against the ideal quantum computer (F_XEB~1). The essential argument is that getting a high F_XEB score achieves quantum supremacy. To support this statement, we would also need to show that any efficiently classically generated distribution scores low on F_XEB. This is a subtle question which is addressed in recent literature[3], but illustrates the core idea of the supremacy experiment: if the quantum device is generating samples that yield F_XEB close to 1, then the device is a good one. If it yields F_XEB close to 0, then it is no more powerful than a classical computer.

As the quality of the quantum processor decreases — often due to noise and error in the processor — we would expect that the F_XEB fidelity would trend towards zero. Errors are causing the processor to behave more classically as it loses quantum coherence. We’ll investigate this further in the next section.

Notice that we only used 4 qubits in the simulated experiments above. As the number of qubits grows, the simulation quickly becomes intractable for classical computers, such as your laptop. To approximate the ideal probabilities for large numbers of qubits (e.g. 40+ qubits), Google had to use state-of-the-art simulation techniques on Google data centers and Summit, the world’s fastest supercomputer. The quantum sampling was performed on a 53-qubit quantum computer named Sycamore. The quantum processor hardware is very cool, but beyond the scope of this post. We encourage you to read the Nature paper to learn more about it.

What is the effect of noise?

Real life quantum processors are noisy and have a non-zero probability of carrying out operations other than the ones intended. This can alter the statistics of the supremacy benchmark. A noise model relevant for the quantum programs executed on Sycamore is the depolarizing model. In this model, the state produced from an execution of the intended quantum program could be thought of as a statistical ensemble of the ideal quantum state (resulting from the application of the ideal quantum program) and the totally mixed state, which represents the uniform distribution over all bitstrings.

Equivalently, we may imagine this scenario as a noisy channel in which the intended quantum program is carried out with some probability, and with remaining probability, one of the Pauli operations on this space are instead carried out[4]. This situation is simulated by the function below.

which we can call to estimate the F_XEB in the presence of some noise

If p is the probability of no (depolarizing) error, then theoretically, the value of F_XEB should equal p*(D-1)/(D+1) where D=2^n is the dimensionality of the n-qubit space (see [2] for more details). This quantity reduces to just p as D becomes very large. Essentially, this means that F_XEB decreases as the noise grows, and becomes zero in the limit where noise occurs in the system with unit probability.

Next Steps

With the experimental demonstration of this supremacy benchmark, we look forward to an exciting future for quantum computing. The Nature paper shows that this benchmark is useful not just for performance testing, but also for calibration of quantum hardware. We hope that these small examples show that the key underlying statistics can be expressed simply.


The code used in this blog post has been aggregated into a repository at: https://github.com/msohaibalam/xeb.

Unitary Fund

Interested in software for quantum computing? Unitary Fund supports open source projects in quantum programming with grant funding. Apply now at https://unitary.fund/.


We wish to thank Adam Bouland, Matt Harrigan and Yousef Hindy for valuable suggestions and useful feedback.

[1] We have complexity-theoretic evidence that exactly sampling from the output of random quantum programs is classically hard, and remains hard in the presence of a small amount of noise (see https://arxiv.org/abs/1803.04402). We also have some evidence that getting a low but non-zero F_XEB score is classically hard (see https://arxiv.org/abs/1910.12085 and https://arxiv.org/abs/1612.05903).

[2] The derivation of this expression is outlined in a small appendix, available on the Github repo for this blog.

[3] See https://arxiv.org/abs/1608.00263, https://arxiv.org/abs/1803.04402, and https://arxiv.org/abs/1910.12085.

[4] Note that more realistically, one would model each separate gate operation as a depolarizing channel, however since our simulation doesn’t decompose into gates, we use this approach to show the general idea.