[ Archived Post ] Visualizing and Understanding Deep Neural Networks by Matt Zeiler

Jae Duk Seo
Jun 17 · 5 min read

Please note that this post is for my own educational purposes.

Deconvolution → is the key idea to do → CNN is not a new idea → LeCun → originally proposed it → and everything went crazy.

In a real example, → we would get something like above → now thanks to GPU. (just build up a lot of parameters). (much more data and models).

Break though was from Geoffrey Hinton → the image net data. (minimize error rate → and it was all good).

There was a huge performance boost → and when compared with the human → only → selected time.

Take an image as the input and try to reconstruct it → this is deconvolution network → it was an unsupervised learning problem.

Use it as a tool → to see the image → the max pooling operation have to be reversed.

Max pooling → we need to know where the strongest element was → so in the reconstruction → the location must be stored → we want to do this correctly.

In the first layer → we see something like above → very similar to V1 → this is color as well as edges. (features from pixels). (from higher layers to back down).

Select the single feature map → and reconstruct it back → super cool.

We need to use the same filter and same weight → as well as the same activations. (and the created visualizations).

The first layer → patches on the image → different colors → as well as edges → what the first layer is looking at. (super cool). (same with the colors).

In layer 2 → learning much more complex patterns → lines and more → blobs and more → very complex → covers much larger space in the image. (broader scope).

When we look at the weakest activation → much more complicated groupings. (hard to tell → what the image is looking → we can cropout).

Skin color → or → some kind of fire shape → these are the building blocks of neural networks.

Much more complex → as we go on → started to learn object parts → dogs face and human face and more → the whole object. (some neurons → detect clouds) → as well as grass are strong features.

Content is more object specific → this is pretty cool → they are more semantically related → the pose can be different → still activates.

In the top 5 → very large → this is a more abstract representation of the model. (super interesting → the model just learns).

See how the activation change → when we translated the image. (when the block is blocking the face → the model is not able to classify well).

The strongest feature → was the number → fifth layer → is a text detector → this is very interesting. (when the faces are blocked out → the classification → increases).

Now → we can use the method → to improve the model → so → normalization was the problem → very specific filters as well as dead filters. (renormalize the first value filter → and created smaller filter → increased the performance).

That was very good → this was a very good method → super cool!

The filters applied to → a different dataset → it can do transfer learning.

Transfer learning → very good success.

Very interesting results.

Jae Duk Seo

Written by

https://jaedukseo.me | | | | |Your everyday Seo, who likes kimchi