# PyCUDA, the FFT and the Gerchberg-Saxton algorithm

The problem of recovering a two-dimensional function of which the amplitude and the amplitude of the transform are known is a classic problem of optics.

The first, simple algorithm to solve this problem is that of Gerchberg and Saxton. It consists in assigning an initial guess to the phase of the function whose amplitude is known. Subsequently, the function thus obtained is Fourier transformed. The phase of the transform is retained, while the amplitude is replaced with the one known as the problem data. The transform thus obtained is transformed back to the space domain. In a very similar way to what done in the transformed domain, in the space domain, the phase of the function is retained, while the amplitude is replaced by the problem data. The process continues until convergence is reached. One criterion to stop the iterations can be to return a result when the unknown function does not change significantly from one iteration to another.

The Gerchberg and Saxton algorithm is also also classified as a *phase retrieval* algorithm as its goal is to recover the phase of the unknown function starting from its amplitude and from that of its transform.

At the Retro Refractions blog, it is possible to retrieve a lot of useful information on the problem we are talking about and on the algorithm in question.

The purpose of this post is to show a simple PyCUDA implementation of the Gerchberg and Saxton algorithm that gives us also the opportunity to point out a possible routine for computing parallel FFTs and IFFTs on Graphics Processing Units (GPU).

In the example below, we will use the square root of the intensity of a black and white image of a butterfly as the amplitude of the unknown function, while the amplitude of the transform will be considered uniformly equal to one.

Before diving into the PyCUDA implementation, let’s walk through a sequential Python version first. Both the sequential and the parallel versions are available at the following link.

The function implementing the Gerchberg and Saxton algorithm in Python is as follows:

Notice that `measuredAmplitudeSpace`

and `measuredAmplitudeFourier`

are the data of the problem, that is, the amplitudes of the function and of the transformed function, and that the phase of the unknown function is chosen completely random. The random phase is generated thanks to the functions of the `numpy`

library.

Inside the `for`

loop, the amplitudes of the function and of its transform are iteratively replaced with the problem data. In the sequential case, the FFT and IFFT are implemented thanks to the `numpy`

library.

The function implementing the Gerchberg and Saxton algorithm in PyCUDA is the following:

It assumes that the `d_img`

input image is already a PyCUDA `gpuarray`

. Opposite to that, it returns a `numpy`

array.

The function follows the same lines as the Python implementation. The phase of the unknown function is still chosen randomly, this time thanks to the facilities of the PyCUDA library which allows to generate random arrays directly on the GPU. The parallel FFT is obtained thanks to the `fft`

function of the `skcuda`

library which is essentially a wrapper around the CUDA `cuFFT`

library. As with the `cuFFT`

library routines, the `skcuda`

FFT library also needs a `plan`

. Unlike the sequential version, the phase is extracted by dividing the function by its amplitude thanks to the elementwise operations made available by PyCUDA. For more details, see Elementary operations in PyCUDA.

Finally, in the repository at the link, another parallel version of the Gerchberg and Saxton function is implemented as follows:

In this slightly different version, the *projection *operation, i.e., the replacement of the amplitude of the functions involved as the problem data, is carried out thanks to an explicit CUDA kernel: