Digital zero noise extrapolation for quantum error mitigation

The paper introduces and improves upon the technique of zero-noise extrapolation (ZNE) for mitigating errors in noisy quantum computations without additional quantum resources, presenting new methods such as unitary folding and parameterized noise scaling, as well as an adaptive protocol using a statistical inference framework.

16 min readJun 21, 2023

Link to the paper : [https://arxiv.org/abs/2005.10921]

Introduction

It’s an increasingly popular technique for mitigating errors in noisy quantum computers without using additional quantum resources. As quantum hardware is available in the NISQ era, it is inevitable that the programmer must deal with errors. However, fault-tolerant quantum hardware has the potential to arbitrarily reduce logical error. But it’s yet to be demonstrated. So a clever workaround is to use error-mitigation techniques, that give practical benefits.

There are many examples of error-mitigating techniques, like

Probabilistic Error Cancellation: https://arxiv.org/abs/1612.02058
Randomized Coupling: https://arxiv.org/abs/1512.01098
Pauli Frame randomization: https://www.nature.com/articles/nature03350
Dynamical Decoupling: https://arxiv.org/abs/quant-ph/0511121
Quantum Optical control: https://arxiv.org/abs/2001.04060

In this article, we will focus solely on zero-noise extrapolation.

What is Zero-Noise extrapolation?

In ZNE, a quantum program is altered to run at different effective levels of processor noise. The result is then extrapolated to an estimated value at a noiseless level. Formally, one can treat the noise as a parameter of the quantum system with a dimensionless scale factor λ. Imagine noise as a dial, so this method allows us to move this dial to a value of 0, hence removing the noise.

So for λ=0, the noise is removed, and for λ=1 the true noise level of the hardware is matched. For a given quantum program, we measure the expectation value, E(λ). So, for different values of λ, suppose E(1) represents the expectation value at the natural level of hardware, and E(0) denotes the noiseless observable.

Keep in mind that this E(0) is not something that we can measure directly, we estimate it via extrapolation techniques, hence the name Zero-Noise Extrapolation.

The Implementation

Since we are treating the noise as a parameter λ, we need a direct or indirect way to scale our system to different values of noise, and even take it to a value greater than 1(add more noise than there actually is). Hence, two main steps are required:

Noise Scaling: Measure E(λ) at m different values of λ>1.
Extrapolation: Infer E(0) from the m expectation values measured in the previous steps.

The figure below shows an example of the noise curve given by scaling the depolarizing noise for a randomized bench marking circuit.

An example of the change of an expectation value, E(**λ),** with the underlying scaling λ of the depolarizing noise level. Here the simulated base noise is 5% (the green line). ZNE increases that noise and back extrapolates to the **λ=0,** expectation value. In this example, an accurate extrapolation should be non-linear and take advantage of known asymptotic behavior.

Noise Scaling Methods

Now, we need a method that somehow scales up the noise. Since we want to implement Zero-Noise extrapolation, we need to have the result of the E(λ) for values with λ>1, so that we can extrapolate it back to zero and get our error free result.

This illustration below shows how Zero-noise extrapolation generally works out.

This chart illustrates the basic principles of ZNE noise amplification, a method of error mitigation in quantum systems. With the ZNE technique, we amplify the noise in our system to different levels, and evaluate the noise at each level. Then, we combine the data from our evaluations with some extrapolation method that allows us to extrapolate back to the zero-noise limit. (image taken from IBM Quantum Blog)

Link to the paper : [1612.02058] Error mitigation for short-depth quantum circuits (arxiv.org)

This work (mentioned above) proposes a time-scaling approach where the control pulses for each gate are re-calibrated to execute the same unitary evolution, only this time it’s applied over a longer period of time, which in turn scales up the noise. Although it has some disadvantages:

It require programmer access to low-level physical-control parameters, which is not available on all quantum hardware.
Control pulses must be re-calibrated for each time duration and error-scaling. This process is resource intensive.

So, given these disadvantages, we have to look at some other method. Using an alternative approach that only require the access up to gate-level of the system. Instead of increasing the time duration of each gate, why don’t we just increase the number of gates, this will in turn increase the depth of the circuit. We are de-optimizing the circuit to increase the effect of noise and decoherence.

As you see the title of the paper “Digital Zero Noise Extrapolation”, the term Digital is used when the noise-scaling techniques work at the gate level or the instruction set layer, not at the quantum hardware level. This is their biggest advantage.

Unitary Folding

A very common way of increasing the depth of the circuit is by unitary folding. In this method a unitary circuit ( or gate) U is replaced by:

In an ideal circuit, U †U = I (identity) , so effectively this whole operation has no effect on the circuit other than increasing the depth. However, on a real quantum computer, increasing the value of n, we see a clear effect as increment of noise.

There are two methods by which you can achieve this:

Circuit Folding

Here we start with an assumption that the circuit is composed of d unitary layers.

d represent the depth of the circuit and each L_i either represent a single layer of operations or just a single gate.

The folding routine is applied to the entire circuit. This scales the effective depth by odd integers. The general circuit folding replacement rule is:

Since we have an odd number as depth, the new depth becomes

The new depth of the circuit that scales with the value of n.

So, the total layers of the new circuit is now

In order to have more fine-grained resolution on the scaling factor, a final folding is applied to a subset of the circuit corresponding to its last s layers. That’s where this 2s come from, since it’s applied twice.

So, we can apply our scaling from d → λd, where

which if you simplify by keeping **d → λd**, in mind, you get d → d + 2k , where k is some positive integer. It’s simply increasing the depth of the circuit and it remains an odd integer.

Conversely for every real λ we have:

Just rearranged the above equation.

From this we can determine the closes integer k, to the real quantity of above equation.
Perform an integer division of k by d . The quotient corresponds to n, while the remainder to s.

The quotient corresponds to n and the remainder to s.

Apply n integer folding and a final partial folding and you get your circuit.

The value of n

The value of s

These are determined by the scale factorλ.

This circuit folding method is just repeatedly driving the Hamiltonian of the qubits forward and backward in time, such that the ideal unitary part of the dynamics is not changed while the non-unitary effect of the noise gets amplified.

Code Implementation:

We will be using Pennylane for this, since they have a built in circuit_folding function that we can use.

Make a Quantum Circuit:

x = np.arange(6) # the parameter value for our circuit gates

@qml.qnode(dev)
def circuit(x):
    qml.RX(x[0], wires=0)
    qml.RY(x[1], wires=1)
    qml.RZ(x[2], wires=2)
    qml.CNOT(wires=(0,1))
    qml.CNOT(wires=(1,2))
    qml.RX(x[3], wires=0)
    qml.RY(x[4], wires=1)
    qml.RZ(x[5], wires=2)
    return qml.expval(qml.PauliZ(0) @ qml.PauliZ(1) @ qml.PauliZ(2)) # the final measurement

How does it look?

This is for the scale_factor=1 it doesn’t affect our circuit. Now setting the scale_factor = 2 results in the partially folded circuit

with

since the circuit is composed of d=8 gates.

folded = qml.transforms.fold_global(circuit, 2)
print(qml.draw(folded)(x))

Setting the scale_factor=3 results in the following circuit.

folded = qml.transforms.fold_global(circuit, 3)
print(qml.draw(folded)(x))

Gate (or Layer) folding

Now, instead of folding the entire circuit (i.e global folding of circuit) , one could fold a subset of gates in place. Taking in the same assumption as before of

where we can assume that each unitary operator L, represent a single gate applied to one or two qubits, or layer of several gates.

Following the same discussion as the circuit folding technique, if we have a circuit of depth d, it is scaled by an odd integer (1+2n). Similarly we can add a partial folding to refine the result.

Now to get to this partial folding, we define an arbitrary subset S of full set of indices {1,2…,d}, such that its number of elements is a given integer s = |S|. This setting give us the following gate folding rule:

Depending on how we chose our S, different noise channels will be added at different positions along the circuit and so we can have different results.

Different Methods for implementing gate folding.

The rest follows similar to circuit folding.

The Good

Advantage of using unitary folding approach is that the noise is scaled using a high level of abstraction from the physical hardware. You don’t have to have the low level access to pulse shaping and detailed physical knowledge of quantum processor physics.

The Bad

Unitary folding may fail to amplify systematic and coherent errors, since applying the inverse of a gate will usually undo such errors instead of increasing them. Also it doesn’t work for the state preparation and measurement (SPAM) noise, since it is independent of circuit depth.

Code Implementation

Let’s code a full ZNE implementation and compare the results:

# install these dependencies
%pip install pennylane
%pip install pennylane-qiskit
%pip install "mitiq>=0.11" # this helps in error mitigation

# running on google colab

We first run the circuit on a noisy device, and artificially add PhaseDamping noise

import pennylane as qml

n_wires = 4

# Describe noise
noise_gate = qml.PhaseDamping
noise_strength = 0.1

# Load devices
dev_ideal = qml.device("default.mixed", wires=n_wires)  # the ideal device with no noise
dev_noisy = qml.transforms.insert(noise_gate, noise_strength)(dev_ideal) # the noisy device with noise

In the above, we load a noise-free device dev_ideal and a noisy device dev_noisy, which is constructed from the qml.transforms.insert transform. This transform works by intercepting each circuit executed on the device and adding the PhaseDamping noise channel directly after every gate in the circuit.

Next, we define our circuit , let’s fix a circuit that applies a unitary U followed by its inverse U†, with U given by the SimplifiedTwoDesign template. We also fix a measurement of the PauliZ observable on our first qubit.

Importantly, such a circuit performs an identity transformation U†U|ψ⟩=|ψ⟩ to any input state |ψ⟩ and we can show that the expected value of an ideal circuit execution with an input state |0⟩ is

from pennylane import numpy as np

np.random.seed(1967)

# Select template to use within circuit and generate parameters
n_layers = 1
template = qml.SimplifiedTwoDesign
weights_shape = template.shape(n_layers, n_wires)
w1, w2 = [2 * np.pi * np.random.random(s) for s in weights_shape]


def circuit(w1, w2):
    template(w1, w2, wires=range(n_wires))
    qml.adjoint(template)(w1, w2, wires=range(n_wires))
    return qml.expval(qml.PauliZ(0))


ideal_qnode = qml.QNode(circuit, dev_ideal)
noisy_qnode = qml.QNode(circuit, dev_noisy)

Let’s visualize the circuit:

print(qml.draw(ideal_qnode, expansion_strategy="device")(w1, w2))

As expected, executing the circuit on an ideal noise-free device gives a result of 1.

ideal_qnode(w1, w2)

tensor(1., requires_grad=True)

On the other hand, we obtain a noisy result when running on dev_noisy:

noisy_qnode(w1, w2)

tensor(0.71729164, requires_grad=True)

Time for Error Mitigation

from mitiq.zne.scaling import fold_global
from mitiq.zne.inference import RichardsonFactory
from pennylane.transforms import mitigate_with_zne

extrapolate = RichardsonFactory.extrapolate
scale_factors = [1, 2, 3]

mitigated_qnode = mitigate_with_zne(scale_factors, fold_global, extrapolate)(
    noisy_qnode
)
mitigated_qnode(w1, w2)

0.898519654741083

It is that easy to set up an Error Mitigation routing, using mitiq . The result is closer to the ideal noise-free value of 1

Let’s understand what’s happening under the hood.

We’ll make a copy of our circuit first.

with qml.tape.QuantumTape() as circuit:
    template(w1, w2, wires=range(n_wires))
    qml.adjoint(template)(w1, w2, wires=range(n_wires)

Let’s see how folding works for some typical scale factors.

scale_factors = [1, 2, 3]
folded_circuits = [fold_global(circuit, scale_factor=s) for s in scale_factors]

for s, c in zip(scale_factors, folded_circuits):
    print(f"Globally-folded circuit with a scale factor of {s}:")
    print(qml.drawer.tape_text(c, decimals=2, max_length=80))

Global scaled circuit for three different values of scale, 1,2 and 3.

When the scale factor is s=1, the resulting circuit is

When s=3, the resulting circuit is

In other words, the whole circuit is folded once when s = 3. Generally, whenever s is an odd integer, we fold (s-1)/2 times.

The s=2 setting is a bit more subtle. Now we apply folding only to the second half of the circuit, which is in our case given by U†. The resulting partially-folded circuit is

Let’s execute all these three circuits in the noise free device, if the circuits are equivalent, they all should give 1 as output.

def executor(circuits, dev=dev_noisy):
    # Support both a single circuit and multiple circuit execution
    circuits = [circuits] if isinstance(circuits, qml.tape.QuantumTape) else circuits

    circuits_with_meas = []

    # Loop through circuits and add on measurement
    for c in circuits:
        with qml.tape.QuantumTape() as circuit_with_meas:
            for o in c.operations:
                qml.apply(o)
            qml.expval(qml.PauliZ(0))
        circuits_with_meas.append(circuit_with_meas)

    return qml.execute(circuits_with_meas, dev, gradient_fn=None)

Let’s execute

executor(folded_circuits, dev=dev_ideal)

By construction, these circuits are equivalent to the original and have the same output value of 1. On the other hand, each circuit has a different depth.

If we expect each gate in a circuit to contribute an amount of noise when running on NISQ hardware, we should see the result of the executed circuit degrade with increased depth.

Let’s execute on the noisy device.

executor(folded_circuits, dev=dev_noisy)

Although this degradation may seem undesirable, it is part of the standard recipe for ZNE error mitigation: we have a family of equivalent circuits that experience a varying amount of noise when executed on hardware, and we are able to control the amount of noise by varying the folding scale factor s which determines the circuit depth.

Performing extrapolation is a well-studied numeric method in mathematics, and Mitiq provides access to some of the core approaches. Here we use the Richardson extrapolation(explained further in the paper) method with the objective of finding a curve y=f(x) with some fixed (x,y) values given by the scale factors and corresponding circuit execution results, respectively. This can be performed using:

# Evaluate noise-scaled expectation values
noisy_expectation_values = executor(folded_circuits, dev=dev_noisy)

# Initialize extrapolation method
fac = RichardsonFactory(scale_factors)

# Load data into extrapolation factory
for x, y in zip(scale_factors, noisy_expectation_values):
    fac.push({"scale_factor": x}, y)

# Run extrapolation
zero_noise = fac.reduce()

print(f"ZNE result: {zero_noise}")

ZNE result: 0.8985196547410811

Looking at the plot:

_ = fac.plot_fit()

Since we are using the Richardson extrapolation method, the fitted function (dashed line) corresponds to a polynomial interpolation of the measured data (blue points).

The zero-noise limit corresponds to the value of the extrapolation function evaluated at x=0, which is 0.8985196547410811.

Parameter Noise Scaling

This topic is focused on “ whether we can exploit specific structure of particular noise model”. Let’s consider a pulse-area error model, which describes what happens when the physical pulses that generate a particular gate in a quantum processor are slightly mis-calibrated.

This noise could be due to fluctuations in control electronics, or uncertainty about underlying physical parameters (such as qubit frequencies) in available hardware.

In order to apply ZNE in this setting, we need a method for scaling this particular kind of noise source. Instead of changing the structure of the quantum circuit, as in the unitary folding method, we directly inject classical noise into the control parameters. This artificial noise can increase the native noise of the hardware to larger levels, such that ZNE becomes applicable. We call this approach parameter noise scaling.

Extrapolation

Zero Noise Extrapolation as Statistical Inference

We already discussed about various methods to scale the noise. Now let’s discuss how to extrapolate the result to the zero-noise limit.

We assume that the output of the quantum computation is a single expectation value E(λ) , where λ is the noise-scale factor. As we did in the code implementation above, where our measurement was just the Expectation of Z observable.

In a real situation with N samples, only a statistical estimation of the expectation value is actually possible:

where δ is a random variable, withe mean = 0 and variance

In other words, we can sample a real prediction y from the probability distribution:

where N(x,y) is a generic distribution (like Gaussian).

Given set of m scaling parameters s λ = {λ1, λ2, . . . λm}, with λj ≥ 1, and the corresponding results

the ZNE problem is to build a good estimator E(0) for E(λ = 0), such that its bias

and its variance

are both reasonably small.

This is the algorithm for generic non-adaptive extrapolation.

From physical considerations, it is reasonable to have a model for E(λ), e.g., we can assume a linear, a polynomial or an exponential dependence with respect to λ.

Polynomial Extrapolation

The polynomial extrapolation method is based on the following polynomial model of degree d:

This essentially corresponds to a Taylor series approximation and is physically justified in the weak noise regime.

Linear Extrapolation

Linear extrapolation is perhaps the simplest method and is a particular case of polynomial extrapolation. It corresponds to the model:

Richardson Extrapolation

Richardson’s extrapolation is also a particular case of polynomial extrapolation where d = m − 1, i.e., the order is maximized given the number of data points:

Poly-Exponential Extrapolation

The poly-exponential ansatz of degree d is:

Exponential Extrapolation

Exponential extrapolation is a particular case of the more general poly-exponential method. It corresponds to the model:

Adaptive Zero Noise Extrapolation

In order to reduce the computational overhead, we can choose the scale factors and the number of samples in an adaptive way. As shown in the figure below.

Differently from the non-adaptive case, in this adaptive procedure the measured scale factors λ are not monotonically increasing. Indeed in the adaptive step, λ_next can take any value (above or equal to 1). In particular, λ_next could also be equal to a previous scale factor λ_j , for some j. In this case, the additional measurement samples N_next will improve the statistical estimation of E(λj ).

Code Implementation

Continuing from the previous code, we now apply the Richardson Extrapolation technique.

from mitiq.zne.scaling import fold_gates_at_random as folding

extrapolate = RichardsonFactory.extrapolate


@mitigate_with_zne(scale_factors, folding, extrapolate, reps_per_factor=100)
@qml.qnode(dev_noisy)
def mitigated_qnode(w1, w2):
    template(w1, w2, wires=range(n_wires))
    qml.adjoint(template)(w1, w2, wires=range(n_wires))
    return qml.expval(qml.PauliZ(0))


mitigated_qnode(w1, w2)

0.954380673804607

We see the result is now even closer to our ideal value of 1 .

Whenever the folding function is stochastic, there will not be a unique folded circuit corresponding to a given scale factor. For example, the following three distinct circuits are all folded with a scale factor of s=1.1:

for _ in range(3):
    print(qml.drawer.tape_text(folding(circuit, scale_factor=1.1), decimals=2, max_length=80))

To accommodate for this randomness, we can perform multiple repetitions of random folding for a fixed s and average over the execution results to generate the value f(s) used for extrapolation. As shown above, the number of repetitions is controlled by setting the optional reps_per_factor argument.

Now using the Adaptive method, specifically the Adaptive Exponential

from mitiq.zne import execute_with_zne
from mitiq.zne.inference import AdaExpFactory

factory = AdaExpFactory(steps=20)

execute_with_zne(circuit, executor, factory=factory, scale_noise=fold_global)

0.9589759363296225

You can see the result is even better than the Richardson extrapolation.

Testing it on a Hamiltonian

Let’s work with the Hamiltonian of H2 molecule and increase the bond length between the atoms. Let’s look for the ground state energy.

from qiskit.test.mock import FakeLima
from qiskit.providers.aer.noise import NoiseModel

backend = FakeLima()
noise_model = NoiseModel.from_backend(backend)

We can then set up our ideal device and the noisy simulator of ibmq_lima.

n_wires = 4

dev_ideal = qml.device("default.qubit", wires=n_wires)
dev_noisy = qml.device(
    "qiskit.aer",
    wires=n_wires,
    noise_model=noise_model,
    optimization_level=0,
    shots=10000,
)

Note the use of the optimization_level=0 argument when loading the noisy device. This prevents the qiskit.aer transpiler from performing a pre-execution circuit optimization.

To simplify this demo, we will load pre-trained parameters for our variational circuit. The weights can be downloaded from here.

# create a folder by the name error_mitigation
# and paste the file downloaded from the link 
# https://pennylane.ai/_downloads/bc0731caabc1d9ec29826e00faab5139/params.npy
# in the folder
params = np.load("error_mitigation/params.npy")

We are now ready to set up the variational circuit and run on the ideal and noisy devices.

from pennylane import qchem

# Describe quantum chemistry problem
symbols = ["H", "H"]
distances = np.arange(0.5, 3.0, 0.25)

ideal_energies = []
noisy_energies = []

for r, phi in zip(distances, params):
    # Assume atoms lie on the Z axis
    coordinates = np.array([0.0, 0.0, 0.0, 0.0, 0.0, r])

    # Load qubit Hamiltonian
    H, _ = qchem.molecular_hamiltonian(symbols, coordinates)

    # Define ansatz circuit
    def qchem_circuit(phi):
        qml.PauliX(wires=0)
        qml.PauliX(wires=1)
        qml.DoubleExcitation(phi, wires=range(n_wires))
        return qml.expval(H)

    ideal_energy = qml.QNode(qchem_circuit, dev_ideal)
    noisy_energy = qml.QNode(qchem_circuit, dev_noisy)

    ideal_energies.append(ideal_energy(phi))
    noisy_energies.append(noisy_energy(phi))

An error-mitigated version of the potential energy surface can also be calculated using the following:

mitig_energies = []

for r, phi in zip(distances, params):
    # Assume atoms lie on the Z axis
    coordinates = np.array([0.0, 0.0, 0.0, 0.0, 0.0, r])

    # Load qubit Hamiltonian
    H, _ = qchem.molecular_hamiltonian(symbols, coordinates)

    # Define ansatz circuit
    with qml.tape.QuantumTape() as circuit:
        qml.PauliX(wires=0)
        qml.PauliX(wires=1)
        qml.DoubleExcitation(phi, wires=range(n_wires))

    # Define custom executor that expands Hamiltonian measurement
    # into a linear combination of tensor products of Pauli
    # operators.
    def executor(circuit):

        # Add Hamiltonian measurement to circuit
        with qml.tape.QuantumTape() as circuit_with_meas:
            for o in circuit.operations:
                qml.apply(o)
            qml.expval(H)

        # Expand Hamiltonian measurement into tensor product of
        # of Pauli operators. We get a list of circuits to execute
        # and a postprocessing function to combine the results into
        # a single number.
        circuits, postproc = qml.transforms.hamiltonian_expand(
            circuit_with_meas, group=False
        )
        circuits_executed = qml.execute(circuits, dev_noisy, gradient_fn=None)
        return postproc(circuits_executed)

    mitig_energy = execute_with_zne(circuit, executor, scale_noise=fold_global)
    mitig_energies.append(mitig_energy)

Finally, we can plot the three surfaces and compare:

import matplotlib.pyplot as plt

plt.plot(ideal_energies, label="ideal")
plt.plot(noisy_energies, label="noisy")
plt.plot(mitig_energies, label="mitigated")
plt.xlabel("Bond length (Bohr)")
plt.ylabel("Total energy (Hartree)")
plt.legend()
plt.show()

Comparison of ideal, noisy and mitigated ground state energy for a H2 molecule. Error mitigation has allowed us to more closely approximate the ideal noise-free curve!

Conclusion

We make zero-noise extrapolation digital, developing the unitary folding framework to run error mitigation with instruction set level access. We then demonstrate improved performance through a set of non-adaptive and adaptive extrapolation methods. We emphasize that zero-noise extrapolation is in general an inference problem with many avenues for further optimization.

In this article, we also learned how to set-up error mitigation using Mitiq as a backend.

This work is a first step towards viewing zero-noise extrapolation as an inference problem and has opportunities for extension. Priors or constraints from observable, noise or circuit structure could be included. Data could be gathered from similar executions over time so that inference includes a historical database of previous computations. Error-mitigation is likely to remain a critical toolkit for the NISQ-era quantum programmer. Improving and bench marking these techniques will likewise remain an important task.

Digital zero noise extrapolation for quantum error mitigation

Introduction

What is Zero-Noise extrapolation?

The Implementation

Noise Scaling Methods

Unitary Folding

Circuit Folding

Code Implementation:

Gate (or Layer) folding

The Good

The Bad

Code Implementation

Time for Error Mitigation

Parameter Noise Scaling

Extrapolation

Zero Noise Extrapolation as Statistical Inference

Polynomial Extrapolation

Linear Extrapolation

Richardson Extrapolation

Poly-Exponential Extrapolation

Exponential Extrapolation

Adaptive Zero Noise Extrapolation

Code Implementation

Testing it on a Hamiltonian

Conclusion

Written by Monit Sharma