Deep Learning Cage Match: Max Pooling vs Convolutions
While a CNN there are many choices. Let’s compare Max Pooling to Convolutions in the context of building an auto-encoder for compressing Atari images.
The Max Pooling operation is typically implemented as a 2x2 kernel with a stride of 2, which takes the maximum value and writes it to the feature map.

But wait, why do Max Pooling at all? Since we already have convolutions, why not just use a 2x2 kernel with a stride of 2? Theoretically this would be better because it could adapt to minimize the loss, whereas a dumb Max Pool can not.
Well.. for Science! Let’s run an experiment and see.
To test this, we will create 2 auto-encoders, 1 uses 2x2 max pooling to reduce dimensionality, the other will use a 2x2 convolution.

Above is the reconstruction MSELoss from auto-encoding images on the test set after 20 epochs of training on 9,000 images. Images are generated by a random policy playing space invaders in Open Ai Gym. “convolutionalpooling” uses 2x2 convolutions in place of the normal max pooling operation (3 filters).
The Convolutional pooling network outperforms the max-pooling network.
Lets check out some reconstructed images.

Maxpooling is center. As you can see, maxpooling creates a lot of cross-hatch type artifacts that are not present when a 2x2 convolution is used instead.
Well, that looks like a win for convolutional pooling to me!
For details on the methods and networks used, checkout the project on github.