Deconvolution → is the key idea to do → CNN is not a new idea → LeCun → originally proposed it → and everything went crazy.
In a real example, → we would get something like above → now thanks to GPU. (just build up a lot of parameters). (much more data and models).
Break though was from Geoffrey Hinton → the image net data. (minimize error rate → and it was all good).
There was a huge performance boost → and when compared with the human → only → selected time.
Take an image as the input and try to reconstruct it → this is deconvolution network → it was an unsupervised learning problem.
Use it as a tool → to see the image → the max pooling operation have to be reversed.
Max pooling → we need to know where the strongest element was → so in the reconstruction → the location must be stored → we want to do this correctly.
In the first layer → we see something like above → very similar to V1 → this is color as well as edges. (features from pixels). (from higher layers to back down).
Select the single feature map → and reconstruct it back → super cool.
We need to use the same filter and same weight → as well as the same activations. (and the created visualizations).
The first layer → patches on the image → different colors → as well as edges → what the first layer is looking at. (super cool). (same with the colors).
In layer 2 → learning much more complex patterns → lines and more → blobs and more → very complex → covers much larger space in the image. (broader scope).
When we look at the weakest activation → much more complicated groupings. (hard to tell → what the image is looking → we can cropout).
Skin color → or → some kind of fire shape → these are the building blocks of neural networks.
Much more complex → as we go on → started to learn object parts → dogs face and human face and more → the whole object. (some neurons → detect clouds) → as well as grass are strong features.
Content is more object specific → this is pretty cool → they are more semantically related → the pose can be different → still activates.
In the top 5 → very large → this is a more abstract representation of the model. (super interesting → the model just learns).
See how the activation change → when we translated the image. (when the block is blocking the face → the model is not able to classify well).
The strongest feature → was the number → fifth layer → is a text detector → this is very interesting. (when the faces are blocked out → the classification → increases).
Now → we can use the method → to improve the model → so → normalization was the problem → very specific filters as well as dead filters. (renormalize the first value filter → and created smaller filter → increased the performance).
That was very good → this was a very good method → super cool!
The filters applied to → a different dataset → it can do transfer learning.
Transfer learning → very good success.
Very interesting results.