Visual explanation of Quantum Approximate Optimization Algorithm

--

This is a final project for the IAP class 6.S089 Intro to Quantum Computing. It was a group project with Colin Poler, Consecrata Rozario, and Chirag Falor. You can follow along with the code with this jupyter notebook on Github.

Optimization problems pop up everywhere in business applications, and algorithms to solve them are largely responsible for the improved standard of living in the modern era. Here are a few ways that optimization has impacted your life:

  • Crops are grown faster and more cheaply when resources like irrigation, fertilizer, and farm equipment are applied more efficiently
  • Groceries are made available more cheaply when inventory levels between grocery stores are optimized to maximize availability and minimize cost
  • During the pandemic, PPE distribution to care facilities was optimized to maximize the number of lives saved while minimizing logistics costs

Our optimization problem: In this article, we’re going to consider a particular optimization problem MAX-CUT, and how it can be solved with Quantum Computers. Here’s an example of the MAX-CUT problem: imagine we have a (small) group of people living together in a (small) town, and they’re all worried about contracting COVID-19 from their interactions:

A (small) group of people that are worried about contracting COVID-19, illustrating the interactions between people. For example, the above might illustrate that people 0,1 share an office, while 1,2,3 go to the same gym and 2,3,4 go to the same grocery store.

The above diagram might illustrate that people 0,1 share an office, while 1,2,3 go to the same gym and 2,3,4 go to the same grocery store. For whatever reason, the town council has decided upon an unusual system for reducing the spread of COVID-19: assign each person to one of two groups, either quarantining on even days or odd days. For instance, the town might assign person 0 to odd days and person 1 to even days, so that they won’t cross paths in the office and transmit COVID-19.

Furthermore, some interactions might be riskier than others. For instance, perhaps person 4 and person 5 sit really close to each other, so it’s particularly important that we separate those two people. In this case, we would assign the 4–5 edge a larger weight.

So the town now faces an optimization problem: what is the best way to assign people to each group to minimize the remaining transmission risk? (our fictional town won’t be very social, but that’s OK) Equivalently, how can we assign the people to CUT the maximum total edge weight?

Formally, we assign a variable xᵢ to each person i where xᵢ=0 is odd days and xᵢ=1 is even days. Then we note that for any particular edge, we are cutting the edge with weight wᵢⱼ if xᵢ(1-xⱼ)=1. So we want to maximize the below cost function:

The QAOA algorithm: the QAOA algorithm is an approximate optimization algorithm. That means it is not guaranteed to find the best solution, but it will find a solution that is almost as good in a much more reasonable amount of time.

First, we start with n qubits which represent our variables xᵢ. For example, one possible solution would be to assign person 0 to even days, and everyone else to odd days, which we would right as |000001〉

Second, we apply a Hadamard operator to each qubit and get a uniform superposition of all possible solutions. So |000001〉is just as likely as |011001〉, and more importantly they’re perfectly in-phase.

The amplitudes of all the states after Hadamard initialization. In reality, the states are stacked on top of one another, so I spread them out randomly to visualize them. The grey circle is the “unit” circle, representing solutions that are equally likely but varying in phase. Critically, in this figure, all possible solutions have the same phase.

Third, we apply a phase rotation that is proportional to the cost of the solution (with a constant of proportionality γᵢ), so that the best solutions progress further than any others around the unit circle. It’s kind of like a race track, where the fastest cars get around the track the fastest. First, we’ll have to turn our cost function from above into a unitary operator that applies a quantum phase rotation.

First we turn the classical cost function into a Hermitian (but not unitary) matrix for qubits. Then, we transform the Hermitian matrix into a unitary matrix parametrized by γᵢ.
The amplitudes of all the solutions after the unitary cost transform (again, the spread is for visualization only). Most importantly, the best solutions have progressed the furthest around the “track”, and are now on the left of the diagram. (note: although tempting to just grab those solutions now, remember that these are quantum phases that cannot be measured just yet)

Fourth, we apply a mixing between similar solutions to move out the best solutions and make them more likely to be measured at the end. In spirit, we would like to compare similar solutions (|100001〉is similar to|101001〉because only one bit changed). The solutions that are better than all their peers (and have progressed further around the track) should move further out, while the solutions that are worse than all their peers should move further in and the solutions in-between shouldn’t go much one way or the other.

(conceptual illustration) Applying mixing to move the best solutions outwards, while moving the less-good solutions inwards. By projecting nearby solutions around by 90° and doing linear combinations, we can in principle use constructive interference to move the better solutions outwards and destructive interference to move worse solutions inwards. But clearly, this is going to depend on everything lining up just perfectly…

In practice, with the mathematical tools we have available, what we actually do is project the most similar solutions (only one bit changed e.g. |100001〉and |101001〉) an extra 90° around the track, and then perform the below linear combination of strength βᵢ to mix each real solution with projections of similar solutions. If everything lines up perfectly, the best solutions move outwards and the worst ones move inwards.

First we construct a Hermitian matrix that mixes solutions; in this case, simply toggling each qubit from one set to another. Then, we transform the Hermitian matrix into a unitary matrix parametrized by βᵢ.
(actual results) After applying the mixing Hamiltonian, indeed some solutions have moved outwards and others have moved inwards. But unfortunately things DIDN’T line up, and the best solutions went nowhere, while the WORST solutions got more likely.

Fifth, we can repeat the above two steps of cost and mixing a total of p times, using new values of γᵢ and βᵢ. If we perform more iterations, we can get those best solutions to move further and further out. But for simplicity, this example uses p=1.

Finally, we measure the state of the qubits. The probability of measuring a state is proportional to the square of the radius above. So the most likely states to measure are the ones that are furthest out, which are evidently the ones which put all people in the same group and made zero cuts. So our optimization didn’t do so great…

The algorithm prefers to select the solutions which make ZERO cuts, because we didn’t get lucky. The expected value of the cost function is about 3.7.

But in any case, we made a procedure that ran the above algorithm for particular values of γᵢ and βᵢ and returned the average (expectation) cost function for 1024 runs of the quantum computer.

The plot twist: To actually get a good optimization, we need to find values of γᵢ and βᵢ that maximize the average (expectation) cost function. So, we used a variant of gradient-descent similar to what machine learning does (in this case, the variant was SPSA, which does a good job dealing with the inherent randomness of the evaluations).

So with 500 iterations of the quantum computer, we ultimately found that for p=1, the best solution is γ₀=0.62 and β₀=1.23. And it works well!

With the right values, it is clear that the best solutions (bottom left in orange) did in fact move outwards, while the worst solutions (top right in orange) moved inwards.
Distribution of cost functions as output from the above. Indeed, the best possible solution is 6, which happens quite frequently!

Furthermore, because our architecture is quite general, we can actually perform exactly the same analysis for any p. For example, with p=4, we get the following:

The phase plot with p=4 is super messy. But the important thing is that the good solutions move way out to the top left.
The state histogram for p=4. It has gotten pretty spiky with the set of optimal solutions.
The cost histogram of the solution with p=4. Clearly, the algorithm has gotten MUCH better at identifying the best solution with C(x)=6

The solution: In the end, the most likely solution returned was |011001〉, meaning person 1,2 and person 5 go out on even days. And you can verify for yourself with the graph at the top that you can’t do any better.

--

--