[ Archived Post ] Personal Notes for Auto-Encoders, Matrix Calculus, kullback leibler divergence, Variance

Jae Duk Seo
Sep 6, 2018 · 5 min read
GIF from this website

Please note that I am making this post as a meaning of creating some personal notes. (aka trashy post). While practicing my skills for matrix calculus.


Auto Encoders

Right image from this website

When we have x as input (mnist data) we can perform something like above.


Denoising Auto Encoders

Right image from this website

The difference is the input data as well as cost function


Sparse Auto Encoders

Image from this website

The above equation holds when we are going to use the KL divergence, but we also can use L1 sparse penalty. (more info here).

When we wish to make the weight more sparse we can use the above equation on the weight itself. (more info here.). Also we can use L2 penalty which makes the values small.

For K sparse we can choose only certain k (top) elements after the first layer. (The back propagation is exactly the same as auto encoders where the activation values for the lower ones are just zero, like drop out.)


Contractive Auto Encoders

This is the first time seeing this, and it looks incredibly similar to adding an sparse penalty.

video from this website

One very important thing to note here is the fact that to update the weight W we need to take the derivative respect to the activation function as well as more additional terms.


Topographic sparse coding

Video from this website

Although in the Andrew Ng’s course they do not use the auto encoder approach it can be learned that way as well.


kullback leibler divergence

Paper from this website

Measurement of the distance between distributions, KL divergence. (Closely related to histogram, as well as statistical independence. More info here.)

Image from this website

Above shows a very good example simple yet effective.

A good side note is to know that KL divergence is not a true metric for distance measurement.

Paper from this website

Finally, here are some good question regarding KL, one, two, three.


Variance, and Co-Variance / Random Variable and Expectation

Using the definition of variance belong we can check that it is true.

Now lets think about co-variance, if we have three different random variables as seen below, and we can assign exact same event to those random variables.

We will get something like above, now lets think about the fact that co-variance among each random variables are same with constant multiplied.

But if we take into account the individual variance for each variable we can calculate the co-variance.

A picture of what an random variable can be seen below.

Image from this website

Using the notation above we can think this as well, an event happens, there is a dog a camera captures that dog, and the generated random variables are the pixel values. (more info here.)

Cute dog

Reference

  1. Deriving gradients using the backpropagation idea — Ufldl. (2018). Deeplearning.stanford.edu. Retrieved 6 September 2018, from http://deeplearning.stanford.edu/wiki/index.php/Deriving_gradients_using_the_backpropagation_idea
  2. Unsupervised Feature Learning and Deep Learning Tutorial. (2018). Ufldl.stanford.edu. Retrieved 6 September 2018, from http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
  3. http://deeplearning.stanford.edu/wiki/index.php/Sparse_Coding:_Autoencoder_InterpretationSparse%20Coding:%20Autoencoder%20Interpretation%20-%20Ufldl.%20(2018).%20Deeplearning.stanford.edu.%20Retrieved%206%20September%202018,%20from%20http://deeplearning.stanford.edu/wiki/index.php/Sparse_Coding:_Autoencoder_Interpretation
  4. UFLDL Sparse Coding Topographic Learning. (2018). YouTube. Retrieved 6 September 2018, from https://www.youtube.com/watch?v=yoVtwSIJD_o
  5. Unsupervised Feature Learning and Deep Learning Tutorial. (2018). Ufldl.stanford.edu. Retrieved 6 September 2018, from http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
  6. (2018). Arxiv.org. Retrieved 6 September 2018, from https://arxiv.org/pdf/1404.2000.pdf
  7. (2018). Herbrich.me. Retrieved 6 September 2018, from http://www.herbrich.me/papers/KL.pdf
  8. https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
  9. function?, W. (2018). Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?. Cross Validated. Retrieved 6 September 2018, from https://stats.stackexchange.com/questions/265966/why-do-we-use-kullback-leibler-divergence-rather-than-cross-entropy-in-the-t-sne
  10. Kurt, W., & Kurt, W. (2017). Kullback-Leibler Divergence Explained. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained
  11. divergence?, W. (2018). What is the difference Cross-entropy and KL divergence?. Cross Validated. Retrieved 6 September 2018, from https://stats.stackexchange.com/questions/357963/what-is-the-difference-cross-entropy-and-kl-divergence
  12. Kurt, W., & Kurt, W. (2015). Random Variables and Expectation with Robots and Stuff!. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2015/2/20/random-variables-and-expectation
  13. Kurt, W., & Kurt, W. (2015). Expectation and Variance from High School to Grad School. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2015/3/19/expectation-and-variance-from-high-school-to-grad-school
  14. Neural networks [6.7] : Autoencoder — contractive autoencoder. (2018). YouTube. Retrieved 6 September 2018, from https://www.youtube.com/watch?v=79sYlJ8Cvlc

Jae Duk Seo

Written by

https://jaedukseo.me | | | | |Your everyday Seo, who likes kimchi

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade