Comparing Statistical Models for Randomized Benchmarking in Qiskit

A Hardware Demo Based on Qiskit Experiments and PyMC3

Published in

Qiskit

10 min readJan 25, 2022

Randomized Benchmarking (RB) is an efficient, elegant and widely used method for estimating quantum computers’ gate fidelity. RB helps users find the optimal pulse sequences when designing gates, and can be repeated for systematic calibrations. The two most-used protocols are standard RB, which estimates the average error rate of a set of quantum gate operations, and interleaved RB, which estimates the average error rate for individual quantum gates.

RB is made possible thanks to the weird reversibility of quantum computing processes. We run sequences of random Clifford gates, followed by their inverse. Due to the noise, the longer the sequences, the less likely the qubits will return to their initial state. Bayesian methods, which begin with a prior understanding of the system updated based on more data, may therefore be better suited to the intrinsic probabilistic nature of RB and to the complexity of hardware noise than current frequentist models, which do not require a prior or updates from the data.

When in February 2021, I learned that IBM Research Haifa mathematician Shelly Garion was recruiting Qiskit advocates for a mentorship project concerning Bayesian models for RB, I was immediately enthusiastic. So I applied and was registered as a mentee for “Bayesian Techniques for Randomized Benchmarking.” I was in charge of designing and testing a hierarchical Bayesian model. My team mentee, Shesha Raghunathan, senior engineer at IBM Bangalore, was of great help in the programming aspect of the project. He successfully improved and made workable the interleaved RB program in the Qiskit noise module, allowing this modality to be investigated.

This project was initiated in the 2021 spring session of the Qiskit Advocates Mentorship Program and continued during the fall session. During the same period, the RB frequentist model was significantly adapted from the original Qiskit ignis version, which is now deprecated. In the new module, Qiskit experiments, the main improvement for RB is a multi-curve approach to the fitting process. This alleviates the issue of large errors when taking ratios, which was present in interleaved RB.

Bayesian methods often narrowed the error bounds of the RB estimations. Furthermore, they provided a credible lower limit for gate error, which is a precious feature inherent to the Bayesian approach.

Working on this topic was for me a continuing challenge and opportunity to develop my knowledge, especially as the Qiskit frequentist version was constantly evolving. Of course, I would never have been able to progress without the constant dialogue with my mentor.

This project is part of the Qiskit Advocate Mentorship Program Fall 21 cohort. If you are interested in joining the Qiskit Advocate program, please check out the application guide. You can also fill this form to get an email update when the new application round opens later this year.

Overview of the demonstration

We first created a level playing field for a hardware comparison of frequentist and Bayesian statistical models. The models can be adapted to a variety of protocols, eg. one or two-qubit standard and interleaved RB.

I will demonstrate both models below, valid for one or two-qubit interleaved RB with Python code snippets.

We start from data obtained for a CNOT gate from a cloud-accessible IBM Quantum hardware using Qiskit experiments.

We begin by reconstructing the frequentist and applying it to the experimental counts. Previously, a check showed that this reconstructed model produced parameter estimates identical to those of the original Qiskit program of data analysis.

We then construct a Bayesian hierarchical model. A Markov chain Monte Carlo (MCMC) algorithm, the No-U-Turn Sampler, is used for posterior sampling (that is, a prior distribution updated by data), with experimental counts as observed values.

Finally, the estimates of the error per gate and of its bounds are compared between the two approaches.

The tying function

Both frequentist and Bayesian models are based on a non linear function called the tying function, which describes decreasing exponential curves. This fit function has four parameters in one or two-qubit RB:

(“Reference” and “standard” RB are the same)

where:

GSP is the ground state population, the proportion of counts with qubits in the ‘0’ (for one-qubit RB) or ‘00’ (for two-qubit RB) state observed at measurement for each sequence.
m is the number of Clifford gates in the random sequence
the depolarizing parameter (a parameter that describes noise) for standard and the additional parameter required for interleaved RB are the probabilities:

the parameter b is the horizontal asymptote of the decreasing exponential curve
the parameter a is such that a + b is the GSP value at m = 0

The scale is a constant depending on the number of qubits N composing the gate:

The error per Clifford EPC is the estimated error of the interleaved gate and can be calculated as follows:

And the statistical error on EPC is:

We’ll then be using the EPC to calculate the gate fidelity and to compare the Bayesian versus the frequentist approaches.

The Qiskit experiment

We performed a two-qubit interleaved RB experiment on ibmq_manila, which is one of the IBM Quantum Falcon processors.

Experiment setting

We used the following code for the settings of the experiment demo:

CNOT [0,1] is subject to experimentation.
We tested ten values of m in the vector lengths. The number of samples num_samples for each sequence length is 20. The experiment therefore resulted in four successive jobs of 100 circuits (each reference circuit also has its interleaved counterpart). This is the maximum allowed in IBM Quantum fair share queuing at one time. Qiskit experiments automatically manages the cap of 100 circuits per job.
We did not give a seed and used the default random number generator.
The additional information includes the data required for calculations and demo plots: system name, gate name, number of qubits, scale, protocol type, and number of copies.

RB curves

As explained in this tutorial, the Qiskit experiments module makes it possible to display the RB curves for reference (“standard”) and interleaved random circuits:

The x-coordinate is the Clifford length m
The y-coordinate P(0) is the GSP

Clifford lengths and count data

For the demo, the experimental results are classified into two numpy arrays shown below, x and counts, which will be arguments for the fitter in the frequentist and Bayesian models.

The first row of the x array is the Clifford lengths vector repeated once. The second and third rows are dummy variables of Boolean nature:

the second row is 1 for the original sequence, 0 otherwise
the third row is 1 for the interleaved sequence, 0 otherwise

This is intended to save computation resources through numpy broadcasting.

Each row of the counts array corresponds to one sample. The values are the numbers of observed ‘00’ bit strings at measurement in this case of two-qubit RB:

Digging into the frequentist model

Function to optimize

Let’s first import curve_fit from the open-source Scipy optimize subpackage. Then we implement the tying function:

Curve fit

Let’s now run the least square fitting process:

Note that the sigma argument is the standard errors of the counts means (SEM).

The bounds enclose the ideal values of .75 for b and .25 for a. The depolarizing parameters are bounded between .9 and 1.

The optimal values for the parameters are returned in the popt array. The estimated covariance of popt is returned in the pcov 2-D array. The perr array contains the systematic errors on the parameters which are the square roots of the diagonal elements of pcov.

Finally EPC and the error on EPC are calculated.

Constructing the Bayesian model

Let’s import the open-source package PyMC3 and name our hierarchical model:

Next we implement priors for the tying parameters, now promoted to the level of hyperparameters:

The priors are jointed bounded uniform distributions π. The bounds are placed around the test values, which we choose to be the popt values.

EPC is deterministically inferred from the hyperparameter π[2].

We now implement the tying function with code similar to the frequentist version:

The GSP values serve as mu parameter for a set of Beta distributions which themselves will be the priors of θ distributions.

As we use mu and sigma to parameterize the Beta distributions, we need now priors for their sigma parameters:

These priors are bounded Gamma distributions sufficiently vague and with heuristically adapted parameters alpha and beta. Those vary with N.
An alternate solution, bounded uniforms, was also tested, but abandoned due to occasional MCMC convergence issues.

We can now define a series of θ distributions which will be the priors for the final fit:

The model will ultimately adjust the hyperparameters to accommodate a binomial log-likelihood:

As defined in the PyMC3 documentation, this consists of “the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p”.

Here:

n is the number of shots
‘yes’ is the observation of a “0” bit string for one-qubit RB or a “00” bit string for two-qubit RB
the prior for p is the θ distribution corresponding to a m value, whether for the reference or interleaved sample.
the values observed are those of the counts array.

After importing the open-source package ArviZ, we obtain a graph of the model:

Now we sample with the appropriate arguments. A plot of the trace is then obtained:

In the current demo, it takes just under a minute and a half on my laptop.

Now let’s take a closer look at the inferred distributions. Let’s start with the four binding hyperparameters:

Here are the posteriors of the bounded Gamma distributions:

And finally the θ distributions:

Some of them are clearly asymmetric (see the pair θ 4 and θ 14 for example), while symmetric distributions are a requirement in the frequentist model.

ArviZ provides now a nice summary of the hyperparameters and EPC statistics, including the lower and upper bounds of the highest density interval:

Bayesian vs. frequentist

The last step in the analysis of the results is the comparison of the two models. We import the open-source matplotlib.pyplot package for this purpose:

As the graph shows, the Bayesian and frequentist estimates of EPC are very similar for this gate. However, the statistical error on EPC is more than twice as large for the frequentist model.

For EPC, the Bayesian credible interval has an upper bound of 0.007, which corresponds to a gate fidelity of at least 0.993.

In this experiment, the Bayesian approach therefore has the advantage of tightening the bounds of the estimates. Judged from the lower bound of its credible interval, the gate fidelity appears to be better than the prediction of the frequentist model.

We tested this protocol in the sixteen CNOT gates from four IBM Quantum systems. The gate fidelity range reported in the calibration data was 98% — 99.5% . The credible lower limit of fidelity reported by the Bayesian approach was significantly higher (p<0.01):

This paves the way for recommending the inclusion of a credible Bayesian interval in RB reports, especially since this analysis does not require any additional quantum computing resources.

In conclusion

I hope you enjoyed this journey into a world where the Bayes’ theorem and the time-dependent Schrödinger equation go hand in hand.

You can test this code using your own data if you want. It is suitable as is for one and two-qubit interleaved RB and can be easily adapted to the simpler standard RB protocol.

I don’t have a crystal ball to predict the future place of the Bayesian approach in RB protocols for quantum computing. However, this work shows that these methods are feasible from data collected on current quantum hardware and at an affordable classical computing price thanks to PyMC3.

In a more general way, these investigations contribute to demonstrating the usefulness of the interleaved RB protocol which has regained strength thanks to the multi-curve option of Qiskit experiments. The Bayesian approach also served as an independent audit of the frequentist model, which passed with honor.

I acknowledge the use of IBM Quantum services for this demonstration. The views expressed are those of the author, and do not reflect the official policy or position of IBM or the IBM Quantum team.