[Archived Post] Personal Notes About Contractive Auto-Encoders — part 1

GIF from this website

Please note that this post is for my future self to review the materials presented in this post.

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

paper from this website

Just from the abstract section I learned that carefully created penalty term can result in extracting more useful and effective features that gives insight to the given data. The penalty term invented by the authors of this paper makes the auto-encoders learned features to be locally invariant without any preference for particular directions. (they obtain invariance in the directions that make sense in the context of the given training data, i.e., the variations that are present in the data should also be captured in the learned representation, but the other directions may be contracted in the learned representation.) And the description of the penalty term can be seen below.

Additionally, one very interesting question the authors asked is the notion, how can we extract robust features? (aka features that are robust to small changes in the given input). The way they did is by adding a penalty term that is sensitive to the given input, and as the network trains, it’s objective is to make that sensitivity smaller and smaller.

And as seen above, the added term is the first layer’s derivative respect to the given input, and this term is sensitive to the given input, hence minimizing this term makes the network extract features that are robust to small changes in the input. (The full loss function can be seen below)

Note when the auto encoders do not have an activation function the above loss function is same as having weight decay (L2 penalty). Additionally, the authors of this paper have investigate the case where the weights are tied. Another interesting fact is sparse auto-encoders that outputs many zeroed out activation units achieves highly contractive mapping, even without an concrete objective functions. One difference between de-noising auto encoders to CAE is DAE makes the network robust to both encoders and decoders, CAE only makes the encoder portion robust. (CAE robust is achieved via analytical solution, while DAE is achieved stochastically. )

If we use a sigmoid function the objective term can be seen above, as well as the compared networks. (RBF was trained via Contrastive Divergence.). The authors have pre-trained the weights using different methods and initialized a MLP to perform classification.

From above results we can see that 2 layer CAE is even able to outperform other types of network that have three layers.

Machine learning for vision

PPT from this website

From this ppt I learned that the proof that a single layer neural network was based on using exponentially large number of neurons and hence it is not practical. Also weight sharing is a method to reuse the weights on different layers, they will differ since their gradient differs. Finally, auto encoders with regularization learns to model the keep only sensitivity to variations on the manifold. (Reconstruction → Forces variations on the manifold, regularization → want to remove variations.)

Additionally, with the contraction loss, the network is trying to find features that are robust to given input. And when we have tied weights auto-encoders we can see the relationship between the decoder weights to encoder weights as smoothen weights. (In terms of the original data.) Here are some links to why we might use tied auto-encoders, one, two.

Finally, the relationship between auto encoders and RBM, but for the details, I highly encourage to read the ppt.

Contractive AE Implementation

Image from this website

If anyone wishes to use the auto differentiation method to implement contractive auto-encoders, here is a great tutorial.


  1. (2018). Iro.umontreal.ca. Retrieved 6 September 2018, from http://www.iro.umontreal.ca/~memisevr/teaching/ift6268_2013/notes10.pdf
  2. Deriving Contractive Autoencoder and Implementing it in Keras — Agustinus Kristiadi’s Blog. (2018). Wiseodd.github.io. Retrieved 6 September 2018, from https://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder/
  3. Autoencoder, T. (2018). Tied weights in Autoencoder. Stack Overflow. Retrieved 6 September 2018, from https://stackoverflow.com/questions/36889732/tied-weights-in-autoencoder
  4. Google Groups. (2018). Groups.google.com. Retrieved 6 September 2018, from https://groups.google.com/forum/#!topic/theano-users/QilEmkFvDoE
  5. [duplicate], A. (2018). Auto Encoder Regularisation using Tie weights. Cross Validated. Retrieved 6 September 2018, from https://stats.stackexchange.com/questions/328502/auto-encoder-regularisation-using-tie-weights?noredirect=1&lq=1
  6. (2018). Icml-2011.org. Retrieved 6 September 2018, from http://www.icml-2011.org/papers/455_icmlpaper.pdf
  7. Chen, F., Wu, Y., Zhao, G., Zhang, J., Zhu, M., & Bai, J. (2013). Contractive De-noising Auto-encoder. Arxiv.org. Retrieved 6 September 2018, from https://arxiv.org/abs/1305.4076
  8. (2018). Yann-ollivier.org. Retrieved 6 September 2018, from http://www.yann-ollivier.org/rech/publs/aagen.pdf

https://jaedukseo.me I love to make my own notes my guy, let's get LIT with KNOWLEDGE in my GARAGE