Quantum Image Analysis. Possible Speedup for Detecting the Skew Angle of Documents.

Published in

Transilvania Quantum

9 min readJan 5, 2020

Image processing is one of the most researched fields in computer science. This is because images and visual cues have the most impact in human behavior and human-computer interaction.

From basic algorithms like rotation, color inversion, applying filters to advanced ruled base algorithms like edge detection to AI like face recognition and OCR, countless research has been published for manipulating images using classical computation. Now, that we are approaching the quantum computing age and quantum computers are becoming reality, it’s only natural that we ask ourselves: Can quantum computing and quantum information bring an advantage to image processing?

Research has already started and the answer seems to be yes. But, like in every promising new field, there are challenges that need to be surpassed. Quantum information processing is already beginning to show some patterns: Ingesting information into a quantum computer is fairly complex, not extremely difficult, but certainly not easy; manipulating information inside a quantum computer is fairly easy and in many cases brings an advantage; extracting information, however, is a very difficult task.

An analogy (although not perfect) can be made to the MapReduce model in big-data problems. There the data (information) is too large to be stored on a single server so it’s distributed among nodes of a cluster. Each node can independently manipulate single pieces of data in parallel — the Map operation. But when trying to output the information one must be careful because trying to output all data to the central server will crash the system (or it can take enormous time to be extracted). Therefore a Reduce operation must be performed where information is aggregated on the nodes and it is only the aggregated result — containing a vastly smaller amount of information — can be extracted to the user.
In quantum computing a large amount of information can be contained in entangled super-positions of a small number of qubits. Operations can be performed on all data points via quantum parallelism. But when extracting data by measurement only a small amount of information can be obtained. Therefore we must either repeat the circuit and measurement many times — which kills the advantage — or reduce the amount of information we are interested in.

I’ll try to show here one way of representing images using qubits. I’ll also show what operations can easily be performed on images represented in this way. Finally I’ll show how we can extract useful information in a way that does not require countless repetitions. Well I’ll try to show it for a narrow domain like document processing. Namely extract the skew angle of an average scanned document.

Representing an image

There are several ways to represent an image using qubits. The first proposed way was via a qubit lattice [5]. This stores a pixel per qubit so it doesn’t really bring an advantage in terms of size. Another approach is Flexible Representation of Quantum Images (FRQI)[3]. For an image of 2ⁿ x 2ⁿ this approach uses 2n qubits to store pixel positions and one more qubit to store the intensity so 2n+1 qubits in total. Another way of representing images in a quantum computer is Novel Enhanced Quantum Representation or NEQR [4] which is similar to FRQI but instead of one qubit for storing pixel intensity it uses a register of qubits.

The approach I used is a little different, but it’s the simplest and most obvious to me. I found it described in two separate papers as QPIE [1]and as 2D-QSNA [10]. This representation just uses the probability amplitudes to store the pixel values for an image. With n qubits we can store 2ⁿ values. An image of height 2ᵐ and width 2ⁿ is normally represented as a 2ᵐ x 2ⁿ matrix, but it can be reshaped as a vector of 2ᵐᐩⁿ values. The values are the pixel intensities. We can normalize those values and use a register of m+n qubits to store the resulting vector in a superposition state. The basis states of the superposition represent the location of the pixel — the first m qubits represent the pixel row and the last n qubits represent the pixel column. The probability amplitude of each basis state represents the normalized intensity of the pixel.

Representing an image in this way has its pros and cons. The most obvious disadvantage is that, because the probability amplitudes must be normalized, we can only store the distance from the mean of each pixel value. Images that have the same intensity in all pixels (all white, all black etc.) will look the same when encoded in this way.
Another con would be that preparing an image requires initializing an arbitrary superposition state, which is not trivial. And we must also note here that operations that require manipulation of pixel intensities, like inverting the image luminosity, applying filters etc. are difficult to implement.
As mentioned in the introduction, extracting a classical image from this state is very difficult, if not impossible, because extracting the whole information from an unknown quantum state would need an infinite number of measurements.

There are also advantages. Most notably the number of qubits needed is logarithmic with the number of pixels. Also, operations that require manipulating pixel locations like rotating the image, translating lines/columns, flipping the image etc. show an advantage, because, with a single quantum operation, we can act on all pixels at the same time. But most notably, we can use the known algorithms that exhibit a quantum advantage while acting on the probability amplitudes like the quantum Fourier transform (QFT), the quantum Haar wavelet transform and the Hadamard transform.
As for extracting the classical image we will not need that, since we only what information about the image — i.e. the skew angle — not the image itself.

Preparing the image

As I said before preparing the image in the appropriate quantum state requires arbitrary sate initialization. This can be done with complexity polynomial with respect to the image size [11].

Since Qiskit already provides arbitrary state initialization via the `circuit.initialize()` method I didn’t bother much with that. All I needed to do is prepare the column vector: concatenate all the columns of the image in a single vector and normalize the pixel intensities.

Retrieving the image

As stated, recovering the classical image from the quantum state is extremely inefficient, if not impossible. But, for the purpose of visualizing the operations we perform, we need to try to extract an approximate image.
The algorithm for doing that is simple. Start with a black image. Do a number of shots of the experiment. For every bit-string in the result decode the pixel location: the first m bits encode the row, last n bits encode the column. Then the number of times the bit-string was measured denotes the pixel value. The more time it was measured the brighter the pixel.

Image manipulation

Next I want to present some operations that can easily be performed when the image is represented in this way, as a quantum state.

Rotation around the center: this can be done by applying an X gate on each qubit.

Flipping around the main diagonal (only for square images): this can be done by swapping the row qubits with the column qubits one by one. This can come in handy when doing edge detection since we can efficiently perform edge detection along a single direction, vertical or horizontal. If we choose to do it in the vertical direction then we can flip the image around the diagonal, detect the edges and then flip back and so we get the horizontal edges too. But this is a subject for another article.

Translate the lines in the image. This can be done by adding (modulo the number of lines) a scalar to the quantum register that holds the rows of the image. Remember that we have m qubits representing the rows of the image and n qubits representing the columns. We can add a scalar to the qubits that encode the lines i.e. if the qubits are in state |x⟩ and we want to add the scalar a we end up with the state |x+a (mod m)⟩. The addition is done modulo m. The translation can be done in the horizontal direction too.
This will help in some post processing after doing the QFT for extracting the skew angle of a document.
Note: Qiskit does not have an arithmetic module so I had to implement the addition myself getting inspiration from Q# — which has a great arithmetic module. And from QArithmetic.

Classical Skew Angle Detection

There are many algorithms for detecting the skew angle of scanned documents. Some of the most popular ones are based on the 2D Fourier Transform (implemented via FFT) [8][9] and we will focus on them.

Firstly the 2D FFT is applied on the image. This is done by first applying FFT on the rows of the image and then applying the FFT along the columns of the intermediate result. After doing this transform, in order to bring the low frequencies in the center of the resulting image the image is split in 4 quadrants (just like the coordinate system quadrants) and then quadrant 1 is swapped with quadrant 4 and quadrant 2 is swapped with 3.

Without going into the mathematical details, because a typical document has the repetitive pattern of parallel text lines, the visual representation of the 2D Fourier Transform shows a dominating line that is perpendicular to the original text lines. If we compute the slope of that line we have found the skew angle of the document. (The visual representation of the 2D FFT only takes into consideration the modulus of the complex elements)

Quantum Skew Angle Detection

For the quantum algorithm we follow the same steps as the classical one. We first compute the QFT on the rows by applying the algorithm on the register that encodes the rows of the image. The we apply it on the register that encodes the columns. After that we swap the quadrants by first translating the image up by 2ᵐᐨ¹ and then to the right by 2ⁿᐨ¹.

Now we have the transformed image just like in the classical case. But how do we find the slope of the line that encode the skew angle? How do we get that information out without an infinite amount of repetition?

Remember that the way we encode an image is such that the brightest points are more lightly to appear in a measurement. So if we do a relatively small number of measurements we have a high probability of getting pixels that are along the bright line.
We do a small number of measurements, say 1024. We order the resulting bit-string by count. We decode x and y from each bit-string/point. We then start from the highest ranked point and find the next point that is a certain distance apart (to avoid error). We then compute the slope of the line that [asses through the points arctan((y2-y1)/(x2-x1)). We do this for a number of high ranked points and we take the angle that comes up the most. We are now done and have found the skew angle of the document.

Detecting the skew angle of documents is vital for further processing like classification or extracting data. This approach could provide a significant speedup for large documents.

The full code can be found here.