Deep Learning Cage Match: Max Pooling vs Convolutions

Duane Nielsen
Sep 8, 2018 · 2 min read

While a CNN there are many choices. Let’s compare Max Pooling to Convolutions in the context of building an auto-encoder for compressing Atari images.

The Max Pooling operation is typically implemented as a 2x2 kernel with a stride of 2, which takes the maximum value and writes it to the feature map.

But wait, why do Max Pooling at all? Since we already have convolutions, why not just use a 2x2 kernel with a stride of 2? Theoretically this would be better because it could adapt to minimize the loss, whereas a dumb Max Pool can not.

Well.. for Science! Let’s run an experiment and see.

To test this, we will create 2 auto-encoders, 1 uses 2x2 max pooling to reduce dimensionality, the other will use a 2x2 convolution.

Above is the reconstruction MSELoss from auto-encoding images on the test set after 20 epochs of training on 9,000 images. Images are generated by a random policy playing space invaders in Open Ai Gym. “convolutionalpooling” uses 2x2 convolutions in place of the normal max pooling operation (3 filters).

The Convolutional pooling network outperforms the max-pooling network.

Lets check out some reconstructed images.

Left: Test image, Center: Max pooling, Right: Convolutional pooling

Maxpooling is center. As you can see, maxpooling creates a lot of cross-hatch type artifacts that are not present when a 2x2 convolution is used instead.

Well, that looks like a win for convolutional pooling to me!

For details on the methods and networks used, checkout the project on github.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade