Principal Component Analysis Pooling in Tensorflow with Interactive Code [PCAP]

GIF from this website

The idea is simple, Max/Average pooling operation in convolution neural networks are used to reduce the dimensionality of the input. And while more sophisticated pooling operation was introduced like Max-Avg (Mix) Pooling operation, I was wondering if we can do the same thing with Principal component analysis (PCA).


Brief Overview of PCA and Simple Tutorial

Above paper does a great job on explaining what a PCA is as well as gives simple examples of what kind of mathematics are used. It would be a good idea to go over the pdf before reading on.


PCA / Single Value Decomposition

Image from this website

Red Box → How we are going to perform PCA in Tensorflow

This blog post does a great job of explaining how we can perform PCA using single value decomposition. And thankfully Tensorflow already have the tf.svd() operation to perform single value decomposition.

Video from this website

I personally had to watch/read additional materials to concretely understand what a single value decomposition is and I have linked the video that I watched above. (This Quora post and this medium post was a good source to read as well.)


(Stupid) Idea Behind Principal component analysis Pooling

Image from this video

Red Box → Original Matrix
Blue Box → Left singular vectors
Purple Box → Singular values (Diagonal Matrix)
Green Box → Right singular vectors

I try to understand hard concepts as simple as possible, so here is my simple (and VERY dumbed downed) version of SVD. Let’s say we have a cake called A, and thankfully this cake A is only composed of two ingredients. Milk M and sugar S. So A = M + S.

However the trick here is to know the exact combination of milk and sugar we need to make the original cake A. Lets say we need 30 Liters of Milk and 50 mg of sugar in-order to properly make this cake A, this means we need to have that information somewhere as well.

The good thing about this is that we don’t have to carry around a cake (that requires lot of storage) we can simply carry around Milk, Sugar and the instruction manual (telling us what percentage of Milk and Sugar we need) as a more compact form. This was my understanding of SVD, and from here we can move on to PCA.

Image from this website

As seen above in PCA, we drop the Right Singular Vectors and modify the dimensionalility of the Singular values Matrix in-order to reduce dimension. We can take advantage of this to make our PCA act as a pooling operation.

For example lets say we have a batch of 200 image with dimension 8*8*1 in a tensor format we can write this as (200,8,8,1). But we already know that we can vectorize the images reshaping the tensor into (200, 8*8*1) = (200,64). Now if we perform PCA to reduce the dimension from 64 to 16, we can reshape it back to an 3D image, making the tensor into (200,4,4,1). So the resulted Tensor have the same dimension as performing average pooling operation.


Exponentially Weighted Moving Averages of Singular Matrix / Alpha, Beta

Red Box → Updating the moving Singular Matrix (sigma) during training, however during testing time we are going to use the moving average sigma value to perform dimensionality reduction.

One small detail we need to note is the exponentially weighted moving average of singular matrix. Just like when we perform batch normalization we keep track of the weights of the mean and std during training time. During testing time rather than using the mean/std value of the data we use the moving average value.

Since we want our model’s prediction to only depend on the given test data during the testing phase. So rather than using the singular matrix from the test data we are going to use the moving weighted average of the singular matrix. (If anyone wants to know more about batch normalization please click here.)

Original Image from the paper

As seen above, in batch normalization we multiply the standardized data with alpha and add the beta term. I will follow this idea and also give our PCAP layer some ability to cancel the PCA operation as seen below.

Red box → Adding the Alpha and Beta term to the reshaped PCA


Network Architecture

Black Box → Convolution Layer
Yellow Box → Convolution + Batch Normalization
Green Layer → Average Pooling layer
Red Box → Principal Component Analysis Pooling Layer 
Pink Box → Global Average Pooling and Softmax

The base network that we are going to use is the All Convolutional Net and as seen above (light blue boxes) the whole network is mainly divided into three different parts.


Results

Left Image → Train Accuracy Over Time / Cost Over Time 
Right Image → Test Accuracy Over Time / Cost Over Time

Surprisingly the model was learning, how to classify MNIST images with PCA Pooling layers. Although we need to note the fact that it barely achieved accuracy of 50 percent. (Both on test/training images).

Final accuracy of 53 percent after 21 th epoch.


Interactive Code

For Google Colab, you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding! Also for transparency I uploaded all of the log during training.

To access the code as well as the training logs please click here.


Final Words

The final results were pretty interesting since the model was actually able to learn how to classify MNIST images. I’m glad to know that it is possible to insert more sophisticated operations such as PCA within the network. (With end to end training) Finally, just as a side note, if anyone is interested in learning more about linear algebra please watch the video series below. (3Blue1Brown is a You-tuber/Mathematician who puts out top quality content)

Video From this website

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also implemented Wide Residual Networks, please click here to view the blog post.


Reference

  1. Li, E. (2018). PCA With Tensorflow. Abracadabra. Retrieved 25 May 2018, from https://ewanlee.github.io/2018/01/17/PCA-With-Tensorflow/
  2. tf.set_random_seed | TensorFlow. (2018). TensorFlow. Retrieved 25 May 2018, from https://www.tensorflow.org/api_docs/python/tf/set_random_seed
  3. The Iris Dataset — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 25 May 2018, from http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
  4. tf.cumsum | TensorFlow. (2018). TensorFlow. Retrieved 25 May 2018, from https://www.tensorflow.org/api_docs/python/tf/cumsum
  5. input, T. (2018). Tensorflow: Using tf.slice to split the input. Stack Overflow. Retrieved 25 May 2018, from https://stackoverflow.com/questions/39054414/tensorflow-using-tf-slice-to-split-the-input
  6. Lee, C., Gallagher, P., & Tu, Z. (2015). Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree. Arxiv.org. Retrieved 26 May 2018, from https://arxiv.org/abs/1509.08985
  7. Principal component analysis. (2018). En.wikipedia.org. Retrieved 26 May 2018, from https://en.wikipedia.org/wiki/Principal_component_analysis
  8. (2018). Cs.otago.ac.nz. Retrieved 26 May 2018, from http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
  9. Li, E. (2018). PCA With Tensorflow. Abracadabra. Retrieved 25 May 2018, from https://ewanlee.github.io/2018/01/17/PCA-With-Tensorflow/
  10. Singular-value decomposition. (2018). En.wikipedia.org. Retrieved 26 May 2018, from https://en.wikipedia.org/wiki/Singular-value_decomposition
  11. tf.svd | TensorFlow. (2018). TensorFlow. Retrieved 26 May 2018, from https://www.tensorflow.org/api_docs/python/tf/svd
  12. Essence of linear algebra preview. (2018). YouTube. Retrieved 26 May 2018, from https://www.youtube.com/watch?v=kjBOesZCoqc&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
  13. [online] Available at: https://www.quora.com/What-is-an-intuitive-explanation-of-singular-value-decomposition-SVD [Accessed 26 May 2018].
  14. Deeper Understanding of Batch Normalization with Interactive Code in Tensorflow [ Manual Back…. (2018). Medium. Retrieved 26 May 2018, from https://medium.com/@SeoJaeDuk/deeper-understanding-of-batch-normalization-with-interactive-code-in-tensorflow-manual-back-1d50d6903d35
  15. [online] Available at: https://www.quora.com/What-is-an-intuitive-explanation-of-singular-value-decomposition-SVD [Accessed 26 May 2018].
  16. Singular Value Decomposition (SVD) Tutorial: Applications, Examples, Exercises. (2017). Stats and Bots. Retrieved 26 May 2018, from https://blog.statsbot.co/singular-value-decomposition-tutorial-52c695315254
  17. (2018). Arxiv.org. Retrieved 26 May 2018, from https://arxiv.org/pdf/1502.03167v3.pdf
  18. [ ICLR 2015 ] Striving for Simplicity: The All Convolutional Net with Interactive Code [ Manual…. (2018). Towards Data Science. Retrieved 26 May 2018, from https://towardsdatascience.com/iclr-2015-striving-for-simplicity-the-all-convolutional-net-with-interactive-code-manual-b4976e206760