Efficient Algorithm for Computing the Total Order of a Permutation

Published in

MIT 6.s089 — Intro to Quantum Computing

6 min readFeb 5, 2023

It has already been demonstrated many times that quantum computation can solve many tasks more efficiently than classical computation. One of the most known cases is Shor’s algorithm, which, assuming a perfect quantum computer, can be used to factorize a positive integer with a high probability of success within the time complexity of polynomial in input size [2].

At the core of Shor’s algorithm is a subroutine for finding the order of a member x of a set S in a permutation f on S; that is, the least positive integer k such that fᵏ(x) = x. In the original paper, this is used to find the order of b in modulo N, where N is the input and b is a random integer relatively prime to N. This can be done by viewing f(x) ≡ bx (mod N) as a permutation of {m | 0 ≤ m < N, (m,N) = 1}.

In this article, we will consider a question closely related to the order-finding problem, which can be formulated as follow:

The Problem Statement

Given a permutation f of [N], the set of nonnegative integers from 0 to N-1, find the least positive integer k such that fᵏ(x) = x for all x in [N]. (We will call such number k the ‘total order’ of f.)

Original Algorithm

First, let’s talk about the rough idea behind the original subroutine.

The circuit consists of two registers: one for iteration numbers (which will be explained later) and one for holding the value of fʲ(x) for some j. At the start, the first register is initialized to the tensor product of p ≈ 2log N + 1 plus states:

The second register is initialized to an n = log N qubits state representing ∣x⟩.

Now, as f is bijective, the operator U_f which sends the state ∣x⟩ to the state ∣f(x)⟩ for all x ∈ [N] can be shown to be unitary, and therefore is a valid gate for a quantum circuit.

For each i ∈ {1,2,…,p}, we will put in a controlled gate that will apply U_f to the second register 2^(i-1) times if the i-th qubit of the first register is ∣1⟩. (Here, the first qubit corresponds to the unit digit.) For example, if the first register has state ∣110⟩ = ∣6⟩, then since the second and third (counted from the unit digit qubit) are ∣1⟩, the unitaries U_f are applied to the second register 2² + 2¹ = 6 times. Notice that the number of times U_f is applied (2² + 2¹) is exactly the same as the state number. This is especially clear if we look at the binary representation of the state.

The circuit for the order-finding subroutine. Here, he topmost qubit is the last one of. the first register. (From https://en.wikipedia.org/wiki/Shor%27s_algorithm.)

In general, if the first register is ∣m⟩, then U_f will be applied to the second register m times, which will become ∣fᵐ(x)⟩. As the first register starts as a superposition, the result after applying controlled gates will be

Here’s the catch: since fᵏ(x)=x, we also get that fᵇ(x)=fᶜ(x) if and only if b and c are equal modulo k. Thus, the states in the first register which are different in modulo k will not be able to interfere with each other.

What this means is that if we apply the inverse quantum Fourier transform to the first register, then for each i ∈ {0,1,…,k-1}, (1/√2ᵖ)(∣i⟩+∣i+k⟩+∣i+2k⟩+…) will ‘roughly’ be transformed into a linear combination of the state of the form ∣m(2ᵖ/k)⟩ for some integer m ∈ {0,1,…,k-1}. Since the terms given by different i’s don’t interfere with each other, the ‘like’ terms don’t simply cancel each other out, and so each of these states can still be measured with significant probability. Finally, repeated measurements on the first register will give different states that, with high probability, have 2ᵖ/k as the greatest common divisor.*

Adaptation

Now, back to our problem. A naive way to imitate this process is by simply making N copies of the second register, each with different starting states. However, this will require a large number of qubits, making the algorithm inefficient. So can we do this better?

The trick is to consider the superposition of those states instead — that way, we don’t need to worry about wasting several qubits just to test all the input. However, applying U_f to (H∣0⟩)^(⊗n) doesn’t change anything (as the states just get swapped around), so what superposition should we use?

One possible answer is to consider n different superpositions

for i ∈ {0,1,…,n-1}, where qᵢ denotes the i-th (rightmost) bit of q. This way, we only need n² qubits for the ‘second’ register.

An example circuit for the case p = 5 and n = 3.

Assuming that applying U_f to these n states for b and c times give the same result, then for each, i ∈ {0,1,…,n-1}, we have the equation

Consider the phase of the state ∣fᵇ⁺ᶜ(q)⟩ on both sides of the equation. For the left-hand side, this phase is -1 to the power of the i-th bit of fᶜ(q). However, the power is changed to the i-th bit of fᵇ(q) instead. For these to be equal the i-th bit of both numbers must be the same. Since this is true for all i and q, we get that fᵇ(q)=fᶜ(q) for all q. Thus, fᵇ and fᶜ must be identical and so b ≡ c (mod k), where k is the total order. On the other hand, the congruence relation would immediately imply that the equation is true for all i. This means that we end up in the same situation as in Shor’s subroutine and we can use the same finishing steps to solve the problem.

Result of running above circuit on the permutation with cycle (0 3) and (1 4 7 2 5 6). The total order is 6 and the peaks correspond to the multiples of 2⁵/6.

Limitations

For this algorithm to be efficient, we need to be able to implement U_f and its power quickly, which may not be true for a random permutation f. That said, it may be possible to do so for some restricted class of permutation (for example, the class of function of form f(x) ≡ ax (mod N) where (a,N) = 1, which is used in Shor’s algorithm).

Another issue we need to consider is that the order of f may be very large; consider the permutation where each cycle has length p₁,p₂,…,pₘ, where pᵢ is the i-th smallest prime number and m is sufficiently large, then N = p₁+p₂+…+pₘ < mpₘ/2 ≈ (m²log m)/2 and k = e^(log p₁p₂…pₘ) ≥ e^(mlogm) >> e^√N [1]. In this case, we would need p >> √N and the gate complexity of inverse QFT would become p² >> N, making the algorithm inefficient.

Notes

*Two caveats: firstly, the states will most likely not be exactly ∣m(2ᵖ/k)⟩, as this may not be an integer. Rather, every state will be possible, but only those very close to ∣m(2ᵖ/k)⟩ could be detected with significant probability. Secondly, for the same reason, we can’t simply take the gcd of the measurements to 2ᵖ/k. The details about how to solve this issue can be found in Shor’s original paper ([2]).

References

[1] Rosser, J. B., & Schoenfeld, L. (1975). Sharper bounds for the Chebyshev functions 𝜃 (𝑥) and 𝜓 (𝑥). Mathematics of Computation, 29(129), 243–269.

[2] Shor, P. W. (1994, November). Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th annual symposium on foundations of computer science (pp. 124–134). IEEE.