[ Archived Post ] Personal Notes for Auto-Encoders, Matrix Calculus, kullback leibler divergence, Variance

Please note that I am making this post as a meaning of creating some personal notes. (aka trashy post). While practicing my skills for matrix calculus.
Auto Encoders


When we have x as input (mnist data) we can perform something like above.
Denoising Auto Encoders


The difference is the input data as well as cost function
Sparse Auto Encoders



For K sparse we can choose only certain k (top) elements after the first layer. (The back propagation is exactly the same as auto encoders where the activation values for the lower ones are just zero, like drop out.)
Contractive Auto Encoders


This is the first time seeing this, and it looks incredibly similar to adding an sparse penalty.
One very important thing to note here is the fact that to update the weight W we need to take the derivative respect to the activation function as well as more additional terms.
Topographic sparse coding

Although in the Andrew Ng’s course they do not use the auto encoder approach it can be learned that way as well.
kullback leibler divergence


Variance, and Co-Variance / Random Variable and Expectation


Using the definition of variance belong we can check that it is true.

Now lets think about co-variance, if we have three different random variables as seen below, and we can assign exact same event to those random variables.


We will get something like above, now lets think about the fact that co-variance among each random variables are same with constant multiplied.

But if we take into account the individual variance for each variable we can calculate the co-variance.

A picture of what an random variable can be seen below.
Using the notation above we can think this as well, an event happens, there is a dog a camera captures that dog, and the generated random variables are the pixel values. (more info here.)

Reference
- Deriving gradients using the backpropagation idea — Ufldl. (2018). Deeplearning.stanford.edu. Retrieved 6 September 2018, from http://deeplearning.stanford.edu/wiki/index.php/Deriving_gradients_using_the_backpropagation_idea
- Unsupervised Feature Learning and Deep Learning Tutorial. (2018). Ufldl.stanford.edu. Retrieved 6 September 2018, from http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
- http://deeplearning.stanford.edu/wiki/index.php/Sparse_Coding:_Autoencoder_InterpretationSparse%20Coding:%20Autoencoder%20Interpretation%20-%20Ufldl.%20(2018).%20Deeplearning.stanford.edu.%20Retrieved%206%20September%202018,%20from%20http://deeplearning.stanford.edu/wiki/index.php/Sparse_Coding:_Autoencoder_Interpretation
- UFLDL Sparse Coding Topographic Learning. (2018). YouTube. Retrieved 6 September 2018, from https://www.youtube.com/watch?v=yoVtwSIJD_o
- Unsupervised Feature Learning and Deep Learning Tutorial. (2018). Ufldl.stanford.edu. Retrieved 6 September 2018, from http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
- (2018). Arxiv.org. Retrieved 6 September 2018, from https://arxiv.org/pdf/1404.2000.pdf
- (2018). Herbrich.me. Retrieved 6 September 2018, from http://www.herbrich.me/papers/KL.pdf
- https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
- function?, W. (2018). Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?. Cross Validated. Retrieved 6 September 2018, from https://stats.stackexchange.com/questions/265966/why-do-we-use-kullback-leibler-divergence-rather-than-cross-entropy-in-the-t-sne
- Kurt, W., & Kurt, W. (2017). Kullback-Leibler Divergence Explained. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained
- divergence?, W. (2018). What is the difference Cross-entropy and KL divergence?. Cross Validated. Retrieved 6 September 2018, from https://stats.stackexchange.com/questions/357963/what-is-the-difference-cross-entropy-and-kl-divergence
- Kurt, W., & Kurt, W. (2015). Random Variables and Expectation with Robots and Stuff!. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2015/2/20/random-variables-and-expectation
- Kurt, W., & Kurt, W. (2015). Expectation and Variance from High School to Grad School. Count Bayesie. Retrieved 6 September 2018, from https://www.countbayesie.com/blog/2015/3/19/expectation-and-variance-from-high-school-to-grad-school
- Neural networks [6.7] : Autoencoder — contractive autoencoder. (2018). YouTube. Retrieved 6 September 2018, from https://www.youtube.com/watch?v=79sYlJ8Cvlc



