Fast — CNN : Subsitution of Convolution layers with FFT layers

Published in

Analytics Vidhya

5 min readApr 13, 2020

I have been thinking a lot lately about Convolutional Neural Networks. The main advantage of convolving the kernel to extract maximum features out of the images, whether there are n-number of channels. The main motive is to extract features so, we can distinguish the target objects. The more number of kernels, the more accuracy but at some extend. But more number kernels means more computation cost. Although CNN is the best algorithm to classify images till now. But there are two major disadvantages, the first is computational cost. Since each pixel is multiply with kernel which is time consuming, if we increase number of filters, it might take several weeks to just train a model(O(n2)). The real time prediction using CNN is next to impossible with great accuracy without using powerful GPUs. The other major drawback that is after convolution the resultant size would decrease if we don’t use padding.

source output dimensions after convolution

Convolution loses edge information, to prevent from it we use padding. It can be of anything eg., 0’s or rotating the image etc. But after padding there are also some lose.

Here I proposed a new method of convolving the image with kernel. Instead of convolution we can use Fourier Transform. By using the property of Convolution and fourier transform we can reduce much time and prevent edges.

The property says that by taking fourier transform of both image and kernel and multiply in frequency domain, and then taking inverse fourier transform, we can get the resultant same or very similiar as the resultant of convolution. Since fourier transform is also tedious process. But Fast Fourier Transform is really very fast we can get output in O(nlog(n)) time complexity. Here I referencing this paper http://ecmlpkdd2017.ijs.si/papers/paperID11.pdf for backpropation. But for now we took a trained model weights so, we only need to design forward propogation.

We could train a model for several hours but, for real time analysis CNN is very slow. So, we are replacing the forward Convolution propogation with FFT algorithm. I got very pretty results also not exact due to removal of edges. It is very helpul for real time algorithms such as YOLO, R-CNN, Faster R-CNN. Since each image is divided into several small parts and need to run algorithm on each fragment to predict the result. So, just imagine how to long will it take just to tell your face is or isn’t in the image. Also very useful for video analysis. For example, the brainpower of self driving cars uses large part of image processing, it would require large proccessing. But by FFT we generate same output as convolution.

How does FFT reduce the time?

Lets say we have 256 by 256 image and we want to convolve the image by 3 by 3 kernel for stride = 1 than it takes 64516 steps just to generate one resultant image (not considering each multiplication). This steps can easily perform by our general CPU. But imagine we have 1000 kernels in single network then it goes ~64516*1000 (if we kept image size as orginal). This is alot of work to do. Now let each step take ‘t’ seconds which is very very small. And total time taken by CNN algorithm just for one image is 64516000*t seconds. Although ‘t’ is very small but by multiply with the steps it raise the time much higher. And also for training a neural network we need alot of images which may lead to 64516000*t*m seconds. So, our main goal is to minimize the steps so that the time consumption decreases. Here the main picture comes. By taking FFT of an image, it might take 50–60 steps depending upon dimensions and size of an image. Same or less steps took by computing FFT for kernel. But before taking the FFT of kernel we need to 0 pad just to make same size as of image. Now multiply the image and kernel in frequency domain. Next step is to take inverse of FFT of multiplication result. Now the resultant of FFT inverse would be the same or very similiar as the output of convolution which we got after 64516 steps. But in FFT approach it takes much less steps and time. Its a extensive change in number of steps. We can use fourier approach to get fast results without altering the accuracy. Since we get similiar output as of convolution, we need to replace convolutional layer with FFT layer. Nothing other changes would required.

I took Brain Tumor Dataset from kaggle and trained a deep learning model with 3 convolution layers with 1 kernel each and 3 max pooling layers and 640 neuron layer. My model does overfits and accuracy also not good better. Since it help us to comparison between convolutional layer output and FFT layers output. As our main concern is to replace convolution layer with FFT layer.

These steps I followed to replace Convolution layers:

I took trained weights and an image.
Took FFT of both.
Multiply them in frequency domain.
And took inverse FFT of multiplication result.
Feed output of this layer into another.
Repeat the steps each time.

After repeating these steps, I got very similiar results.

I predicted 253 different images with both models. Time taken by CNN to predict class for 253 images is 188.1543300151825 and Time taken by FFT to predict class for 253 images is 39.134761571884155. It reduced the time by the factor of 4.807856811.

Conclusion: The FFT layers is ~5 times faster than Convolution layers. And we attain same dimension after FFT layers without padding.

Here is my GitHub Repository https://github.com/pushkar-khetrapal/Fast-CNN.

Thanks for reading! I hope you enjoyed the article and gained some additional insights. If you did, feel free to leave a clap! Constructive feedback is appreciated. Feel free to reach out to me here or pushkarkhetrapal12@outlook.com.

Pushkar Khetrapal

Fast — CNN : Subsitution of Convolution layers with FFT layers

Written by Pushkar Khetrapal