Cutting Through the Hype of Quantum Optimization

Qiskit

Published in

Qiskit

13 min readSep 29, 2021

By Robert Davis, Technical Writer, IBM Quantum and Qiskit

“If this is the best of possible worlds, what then are the others?”

― Voltaire, Candide.

Optimization problems are everywhere, and we spend a lot of time solving them intuitively. We encounter them whenever we try to puzzle out the shortest route between after-work errands, when we search for a restaurant to accommodate conflicting tastes and dietary restrictions, when we try to predict which line at the grocery store will move fastest. For more complex optimization problems, we use classical computers to run algorithms that seek out the best possible solutions, but these problems quickly become incredibly difficult and computationally costly. Researchers have long theorized that quantum computers could tackle these problems quicker, and with greater accuracy.

Over the years, these theoretical improvements have generated a lot of excitement in the quantum community. In part, that’s because optimization problems are ubiquitous. Quantitative experts can use them to represent a vast array of important and enormously valuable real-world problems in fields like financial planning, supply chain management, civil engineering, and countless others. It doesn’t take much of a leap to argue that a breakthrough in quantum optimization could change the world — cutting millions of miles from the supply chain, fighting climate change, or even eliminating traffic congestion. Increasingly, however, experts say that claims like these are overblown, especially when it comes to existing quantum optimization algorithms, and that further research is required to unlock the field’s potential.

“Optimization has been hyped by people outside the field, but researchers in the field never had reason to believe that optimization was as likely to demonstrate exponential quantum advantage as, for example, certain areas of applications in quantum chemistry,” said IBM researcher Giacomo Nannicini. That’s because the quantum algorithms we have today only offer modest speed-ups over their classical counterparts. Those modest improvements can be significant when it comes to especially large optimization problems and other specific cases, but in many other cases, you’d hardly notice the difference.

On the other hand, if researchers can find an optimization problem for which a quantum computer can deliver an exponential speed-up over classical techniques, that advance could change the field — and the world — forever. In general, these problems that quantum computers could one day solve exponentially faster than classical computers are the most important when it comes to finding real world applications. Quantum computers don’t seem to offer exponential speedups for black box optimization problems — problems where we don’t know anything about the dataset from which we’re trying to find an optimal solution. However, it’s possible that there may be some exponential speedup in cases where you know a bit more about the problem. That’s why it’s still important to pursue optimization to see how and where quantum computers can help — and to advance the field more generally.

But to understand why this is the case for quantum optimization, we first need to understand what optimization problems are, how classical computers handle them, and how quantum computers may be able to improve upon classical optimization techniques.

Mathematical Optimization: Seeking the Best of All Possible Worlds

Example of a Sudoku problem (above) and its solution (below).

Image Credit: By en:User:Cburnett, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=57831971

An optimization problem is any problem where your goal is to find the best of all possible worlds with respect to certain variables and constraints — i.e., the best, or “optimal,” solution from a finite (or countably infinite) set of possible solutions. For example, we can consider the popular puzzle game Sudoku as an optimization problem that requires the player to fill a 9x9 grid with numeric digits such that each row, column, and 3x3 sub-grid contain all digits 1–9. In an unsolved Sudoku puzzle, each empty cell represents a “decision variable” that the player must calculate to solve the problem, while the rules of the game itself represent the constraints that the player must satisfy to achieve an optimal solution.

How does a computer solve a problem like this? Well, there are many different optimization algorithms that model problems in different ways.

“One example you could look at is just a simple gradient descent algorithm,” said IBM researcher and optimization expert Srinivasan Arunachalam. “Gradient descent is the most fundamental algorithm used all over for optimization,” he said. It also happens to be the workhorse of machine learning — but that’s for another story.

In mathematics, a gradient is simply a description of the direction and steepness of a particular line or surface. You can represent just about any optimization problem as a mathematical function, and you can represent any mathematical function graphically as a line or multi-dimensional surface, depending on how many variables the function includes. A function with one variable describes a line, a function with two variables describes the shape of a plane, and so on. Think of those raised relief topographic globes, the kind you might see in a school classroom with bumps and ridges that let you feel the height of the of the Himalayas and the flatness of the Great Planes with your fingers. These are just 3-dimensional graphs of height, a function where the two variables are latitude and longitude, and the function’s output is elevation. Graphical representations of functions with three or more variables can be difficult to visualize, but the same ideas apply.

Depicting an optimization problem in this way usually gives you something that looks like that topographic map of some mountainous region, filled with hills and valleys. In optimization problems, the lowest points in those valleys usually represent the most optimal solutions, while the peaks of the hills represent the least optimal solutions. That’s why it’s called “gradient descent.” Your ultimate goal is to work your way down to the lowest point of the map. This can get challenging, however, because there’s usually no way to get a bird’s eye view of the entire function. In fact, you can’t see anything beyond your immediate surroundings. It’s as if you’re descending a mountain at night, picking your way through the dark, thick forest.

Graphical representation of a gradient function. Image credit: Public Domain.

In classical computing, the only way to get to the lowest point on the map — the optimal solution — is to first look at your surroundings, then figure out which direction is going up the local gradient and which direction is going down, and then take a step in the direction that seems to be going downhill. “A standard gradient descent starts at some point, computes the gradient of the function of that point, and then…it decides where I should walk,” said Arunachalam. That decision is governed by a “gradient update rule,” which determines how the algorithm moves with each new computation.

In gradient descent, every step you take requires a whole new round of computation, and you can never be 100% sure that you’re going towards the lowest point on the entire map. You might just be headed down to some shallow valley far from the true lowest point, a.k.a. the “global minimum.” This is part of the reason that gradient descent is so challenging and computationally costly for classical computers. The computations behind each individual step are fairly resource-intensive on their own, but become overwhelming when taken altogether. At the same time, complex optimization problems may require millions of steps to converge on a global minimum.

That’s a lot of computation. Let’s take a look at how quantum computers would make this process more efficient.

The Quantum Optimization Boost

For decades, quantum researchers have believed that quantum computers may be able to offer modest improvements over classical methods like the one described above. That isn’t because researchers think quantum computers can replace classical computers entirely, executing the gradient descent algorithm unconditionally faster than classical computers. Rather, it’s because quantum computers can be faster at performing a few specific steps in the algorithmic procedure.

According to IBM’s Giacomo Nannicini, the same reasoning applies to most cases of theoretical quantum advantage. “I would say that the reason we think there is a quantum advantage for optimization is precisely the same as why we’ve been able to find advantage in other fields,” he said. Because quantum computers should be able to outperform classical computers at certain tasks that are required for optimization algorithms, they can be integrated as a subroutine in the broader algorithm.

To explain this more clearly, we’re going to do a little bit of math. Don’t worry — we’ll keep it simple. If you’re unfamiliar with linear algebra, all you need to know is that a “vector” is just a list of numbers. Each item or element in that list represents one of the variables in your optimization problem.

Let’s say your optimization problem can be represented as a linear function f, meaning that the function describing it is just the inner product of two vectors — one vector multiplied by another to yield a single number. One of those vectors is fixed but unknown. We’ll call it vector a (i.e. [a₁, a₂, a₃…aₙ]). The other is your input vector, x (i.e. [x₁, x₂, x₃…xₙ]). If the vectors both have n number of elements, then taking the inner product just means multiplying the corresponding elements and adding them together like so: [(a₁*x₁) + (a₂*x₂) + (a₃*x₃) + … (aₙ*xₙ)].

One of the essential procedures in the gradient descent algorithm is calculating the gradient of function f for some given point x. That’s essential information when you’re deciding where to go next in your gradient descent. If f (x₁, x₂…xₙ) = (a₁*x₁) + (a₂*x₂) + … (aₙ*xₙ), then the gradient you’re looking for at the given point x is just the unknown vector a. Your job is to solve for a.

This is a fairly simple task in classical computing, and there are many ways to go about it. One approach would be to set your input vector x to a combination of 1s and 0s, such that you can multiply vector x by vector a to get each element of a by itself.

For example, if your vectors only have three elements each, you’ll start by setting x to [1, 0, 0]. When you multiply that by the unknown vector a, you get [(a₁*1) + (a₂*0) + (a₃*0)]. Multiplying the first element in a by 1 keeps it the same, while multiplying the other two elements by 0 removes them from the equation altogether, so this process gives you [a₁ + 0 + 0], which is just a₁. You can then do the same for each of the other elements in vector a, setting your input vector x to [0, 1, 0] to determine the value of a₂, and then setting it to [0, 0, 1] to determine the value of a₃.

That may seem easy enough, but it can get messy when you’re dealing with vectors that have dozens or hundreds of elements that must be calculated. In a classical regime like this, you have to make a query for each individual element to figure out the contents of vector a. Ultimately, that isn’t very efficient. Researchers believe that quantum computers can offer a polynomial speedup (and specifically, a quadratic speedup) over classical computers for tasks like these.

“Quantumly, there’s a nice procedure called Bernstein-Vazarani,” Arunachalam said. This famed algorithm allows you to determine vector a using just one query, as opposed to the n number of queries required for classical methods. That’s because Bernstein-Vazarani performs the query quantumly — the query is made with all elements of x in a superposition. From there, you apply a quantum Fourier transform (QFT), and just like that, the Fourier transform presents you with a, all by itself. If you’re unfamiliar with superposition and QFT, we encourage you to take a look at the Qiskit textbook.

“You can essentially learn the entirety of vector a by just making one quantum query to x, but classically you need n queries,” Arunachalam said. This, in essence, is the fundamental idea suggesting that quantum computers should be able to deliver a quadratic speedup for gradient descent. However, it’s important to note that this is a simplified example, so in the grand scheme of things, this speedup would still be considered relatively small.

“This example is just for a basic linear function, which today’s classical computers can handle easily,” Arunachalam said. “The real quantum advantage comes when you apply these same principles to multidimensional non-linear functions, which are much more common in real-world optimization problems, and much more difficult for classical computers.” In a 2017 paper, Arunachalam and his co-authors were among the first to demonstrate a quadratic speedup using quantum gradient computation for multidimensional functions.

Once you understand this logic, it’s easy to extend it to deliver quantum improvements for all kinds of convex optimization techniques — of which gradient descent is just one example. Convex optimization is a broad category that encompasses both gradient descent and many other optimization methods. It plays an enormous role in the classical machine learning systems that power most of the automated technology we have today. One can also apply this logic to obtain more specialized quadratic speedups in convex optimization techniques like semidefinite programming and simplex method, which quantitative professionals use for a wide array of industry applications.

A quadratic speedup — and polynomial speedups in general — may not be quite as big as the exponential speedups that have been found in some other areas of quantum research, but they can still be significant. This is especially true in the context of optimization problems, where every little improvement can drive a tremendous amount of real-world value. However, for many quantum researchers, polynomial speedups simply aren’t substantial enough to make them a top research priority.

Not All Speedups Are Created Equal

When we talk about quantum advantage, and specifically “quantum speedups,” we’re really talking about how quantum computers exhibit improved time complexity for certain algorithmic tasks when compared to classical computers. Time complexity is a way of describing how much time (or how many computational steps) it takes to run an algorithm and, crucially, it also describes how that runtime changes as the size or number of inputs gets bigger and bigger.

In a polynomial speedup, the word “polynomial” simply refers to the kind of mathematical function that describes the relationship between two different time complexities. For example, if a classical computer requires N² steps to complete a task for n inputs, and a quantum computer only requires N steps to complete a task for n inputs — that’s a quadratic speedup. Impressive, right? The quantum computer can do something in 10 steps that would take the classical computer 100 steps. That sounds like a pretty big improvement.

But the truth is that a speedup like that is relatively meager in the grand scheme of things — especially in the short term. An exponential speedup would be much more significant. For example, if a classical computer requires 2ᴺ steps to complete a task for n inputs, and a quantum computer requires only N steps to do the same thing, that’s huge. It means the quantum computer can do something in 10 steps that would take the classical computer 1,024 steps — and the difference grows dramatically as the size of the input gets bigger.

Another consideration is that there’s a big difference between a theoretical speedup and a practical one. IBM’s Srinivasan Arunachalam pointed out that many theoretical quantum speedups are based on simplified problems that don’t exist in the real world, and that they often come with large pre-factor constants.

“These problems are slightly artificially contrived. They’re not really something a layperson would care about,” he said. To operate in the real world, researchers like Arunachalam must develop generalized quantum algorithms that will work for arbitrary functions — a category that includes practical, real-world problems that can be enormously complex — no small task.

Arunachalam also noted that the field of classical computing is fairly mature when it comes to hardware development, while quantum hardware development is only just getting started. Today’s quantum hardware is generally noisy and error-prone, and the gates that are used to build quantum circuits and algorithms are still slower than classical gates — at least for the moment.

“[With] classical computers, people have been looking at this for decades,” Arunachalam said. “Everything is so optimized that every operation takes like a nanosecond or something like that. Quantum computers, or quantum gates, are still far slower to perform than classical.”

For Arunachalam, proving a theoretical speedup is only half the battle. “Even if we get a theoretical exponential speedup, there are still a lot of practical considerations we need to address before we can implement them,” he said.

Why We Continue to Invest Time in Quantum Optimization

All of this raises an important question. Namely, if there is no known exponential speedup in quantum optimization, and the known benefits are so modest, why do major companies continue to invest time and money into researching it?

Continued research into quantum optimization is important, first off, because we still hope to find examples of optimization problems for which there exists a faster-than-polynomial speedup — specifically a faster-than-quadratic speedup. Quantum optimization research also has an impact outside of quantum computing. Arunachalam described one case in which a team of quantum researchers announced that they had found a way to “beat” classical computers at a certain kind of optimization problem, only for classical researchers to announce that they had come up with a method to beat the quantum team’s new technique. “One very nice thing is the fact that when quantum researchers claim they’ve found a way to beat classical computers, that motivates classical theorists to find better algorithms for important problems,” Arunachalam said. “I think that’s a very fundamental contribution just to the field of science.”

For IBM’s Giacomo Nannicini, it’s important to not dismiss the idea that even a modest speedup in solving optimization problems is well worth the effort once quantum hardware has matured, given the enormous economic and societal value of mathematical optimization. “There are very many areas of engineering, and business in general, where we routinely solve optimization problems,” he said. “Optimization is so useful and it’s used so pervasively in the industry that if you’re able to obtain any advantage, it will benefit your business.”

Interested in getting started with quantum computing yourself? Start your journey here.

Cutting Through the Hype of Quantum Optimization

Mathematical Optimization: Seeking the Best of All Possible Worlds

The Quantum Optimization Boost

Not All Speedups Are Created Equal

Why We Continue to Invest Time in Quantum Optimization

Written by Qiskit