What would you do with 1000 QPUs?

Making multi-device computation easy with PennyLane

Xanadu

Published in

XanaduAI

8 min readFeb 18, 2020

By Nathan Killoran, Tom Bromley, and Josh Izaac

This year will be a big one for the quantum computing industry. In addition to the existing quantum computers from IBM and Rigetti that are currently accessible over the cloud, a number of new hardware providers look poised to take their hardware public in the coming months. Tech giants like Microsoft and Amazon are building up new quantum cloud platforms, with hardware from multiple partners being made available. The established offerings will continue to grow as well, as IBM and Rigetti add more quantum processors (QPUs) to their platforms.

At Xanadu, we are always thinking ahead to what might come next. One thing we have been thinking about a lot recently is: what will happen when dozens, hundreds, or even thousands of QPUs are available? How different does that environment look from the current status quo? Although it’s early days, this is not such a crazy thing to think about. This blog post is an early foray into this largely uncharted territory.

These questions are also inspired by recent trends in the realm of classical computing. Initially, CPUs were used as singular general-purpose processing units: they could tackle pretty much any computing task you threw at them. Many performance gains over the years were achieved by simply increasing the speed of CPUs. Only as that approach started to hit fundamental limits did multi-core CPU architectures really take off. GPUs are perhaps the most extreme example: rather than a single all-powerful core, GPU architectures are dominated by thousands of special-purpose processing cores.

It might be easier to reach the 1000 QPU threshold before we reach the 1000 qubit threshold.

On the quantum side, it is a challenging task to continue designing newer and better quantum computers, as needs to happen on the path to universal fault-tolerant devices. As well, the resources that go into developing quantum computers can be very front-loaded: months or years of R&D, including numerous fabrication runs to test out and validate new ideas.

Making 1000 QPUs means duplicating existing technologies 1000 times; on the other hand, making 1000 qubits might require inventing completely new technologies. Once quantum computer designs improve enough to show convincing quantum advantage — at first on a very narrow set of tasks — it may end up being viable to mass produce the same 100-qubit chip design. Could the collective performance of such a “multi-core” QPU setup be beneficial in the near term? If one QPU shows some small quantum advantage, can we gain a larger advantage by combining many QPUs together?

Multi-QPU computation: now available

In the latest release of our quantum computing software library PennyLane, we’ve introduced a number of new features to make computations involving multiple QPUs more seamless and accessible to the user (check out the release notes here). Our primary goal with PennyLane is to enable discovery: by putting cutting-edge features and ideas into the software, users will be empowered to quickly prototype new ideas and develop new algorithms, which will move the whole community forward.

We will present a number of simple ideas and use-cases below where many QPUs could provide a benefit over a single QPU, even for today’s small and noisy devices! But before we can start thinking about how thousands of QPUs might operate together, we first need to build up our intuition by establishing some basic ingredients.

Building blocks

Our first step is to encapsulate what a single QPU is doing: executing a quantum circuit. This quantum circuit takes in some classical inputs (this might be input data 𝑥 corresponding to a particular problem instance, or free parameters θ which are used in a variational circuit). In the circuit, a sequence of gates are applied. Finally, some final measurement is obtained (this could be a sample or an expectation value), and the QPU returns some classical information. In PennyLane, we call this abstraction a quantum node (QNode).

A QNode encapsulates the basics of a quantum computation: state preparation (which may depend on some input data), a transformation of this state (containing many parameterized gates), and a final measurement. A classical computer sees it simply as a callable function.

From the perspective of a classical computer, a QNode is just a callable function — call it f(x) — which takes in classical inputs and returns classical outputs. The finer details of how to evaluate this function are left to a specialized device: a QPU.

Once we have our basic abstraction for a quantum computation (the QNode), we can start thinking about how to compose them, potentially leveraging multiple QPUs. At a high level, we can consider two basic notions, similar to the classical case: sequential versus parallel computation.

Sequential

In this approach, a single function is evaluated and the output of that function is used as the input for a subsequent step. The second step can’t run until the result from the first step has been determined. This allows us to build up more complex higher-order computations out of simpler building blocks. This is the essence of most computer programs.

In the quantum case, we can replace each individual step in the sequence with a QNode, taking some classical inputs and producing some classical outputs. Why might we want to do this? At the moment, this looks like an open question, at the forefront of research. One possible motivation is the limited depth of near-term QPUs: we might ideally like to perform a long calculation on one QPU, but suffer from decoherence. What we can do instead is to break down a calculation into multiple steps with a shorter — but still meaningful — depth¹.

Parallel

Another basic computing mechanism is parallelism, in which many calculations are carried out concurrently. Importantly, in a parallel computation, each branch can execute independently; there is no need to communicate or wait. Afterwards, the results of these parallel segments may be aggregated, e.g., by taking their average or a linear combination. Unlike sequential computations, where one step has to finish before another can start, parallelism can provide speed-ups in real-world execution times by running many computations at the same time.

Execution of algorithms can be made faster, and, for the same real-world time, more accurate estimates can be obtained.

The notion of parallel quantum computations isn’t really prevalent today² — largely due to the fact that there are only a handful of publicly accessible QPUs in the world. At the same time, there are already a number of workflows where parallel quantum computation could easily be used to provide real-time speed or accuracy advantages for the end-user. Here are a few ideas:

*Ideas for how to leverage parallelism over multiple QPUs.*

Inspired by these ideas, we’ve posted two fully coded multi-QPU demos at pennylane.ai/qml. Specifically, we have new tutorials for parallelizing quantum chemistry calculations and training ensemble models across two different QPUs.

Even with today’s hardware and algorithms, there is already a clear benefit to using multiple QPUs in parallel: the overall execution of algorithms can be made faster, and, for the same real-world time, more accurate estimates can be obtained.

Computational Graphs

Once we have recognized the basic computational building blocks, we can start composing them to create more complex multi-step computations that are neither purely sequential nor purely parallel. This leads to the notion of a computational graph, showing the (in)dependencies of various steps of a larger computation. This viewpoint is commonly encountered in deep learning software such as TensorFlow.

Every node in the computational graph represents a distinct computational step, taking its inputs and mapping them to some outputs. The function that a given node carries out may be executed on either a classical or a quantum computer. Classical information flows between each step of the computation³.

Here is an example of a more complex hybrid computational graph, showing a Variational Quantum Eigensolver (VQE) algorithm being run in parallel on two QPUs:

Quantum nodes in a hybrid computational graph. This example uses PennyLane’s QChem module to perform a VQE computation. Expectation values of independent measurement operators can be computed in parallel using different QPUs.

This blog post was an early exploration into the uncharted territory of multi-device quantum computation. It’s important to keep thinking about what’s coming next. But you can actually try these ideas out today!

Xanadu’s software library PennyLane has always supported programs with arbitrary classical-quantum computational graphs, including the parallel and sequential models discussed above. However, programs were ultimately still executed sequentially under the hood. In addition to a number of other new features, the latest release of PennyLane now supports the ability to truly execute parallel computations asynchronously, meaning you can get your answers much quicker.

These new features will open up new opportunities and ideas, and we’ve given a few examples here. Now that the tools are available, we’d love to see what the community can come up with!

Footnotes:

[1] There may only be a benefit to sharing a task between multiple processors if each processor carries out its portion of the computation much faster/better than the others. Otherwise, we can just use the same processor for both steps. The sequential approach is therefore well-suited suited to hybrid classical-quantum computations.

[2] We should not confuse parallel computations with computations which happen in superposition, which is unique to quantum computing. Superposition requires the ability for the QPUs to coherently accept arbitrary input quantum states (rather than starting in a known fixed state). There is also the related subject of distributed quantum computing, in which a network of QPUs collectively process a large entangled state. The requirement to entangle multiple QPUs makes this approach more challenging, but it may be possible for photonic quantum computers.

[3] Though we don’t consider it here, in the most general case, we could allow for quantum data flowing between blocks, though transmitting quantum data between different devices remains technologically challenging.