Can you tell us something about ‘Global Average Pooling’?

Rahul S
Aaweg Interview
Published in
3 min readNov 29, 2022

--

A typical classifier, except the last two/few layers, is nothing but a feature extractor.

A feature extractor is made of many convolution layers with activation functions and occasionally spatial compression operations called MaxPooling. It produces multiple feature maps that flow down the network.

During training, the feature extractor learns to represent the important features in the image (of objects in the image) in different feature maps. While the first few layers are limited to simple features like edges and simple shapes, the latter layers learn complex patterns like ‘blue eyes’, ‘hair’, ‘legs’, etc. provide ‘meat’ to the output layers to be used to classify images with.

After the so-called feature extractor of the classifier, we have either a Flatten() or a Global Average Pooling layer before the final Sigmoid/Output layer.

The flattening layer

The flattening layer learns the best coefficients for linearly combining the attribute intensities in a way that predicts the object class. For example, the coefficients the classifier will learn for combining the ‘tail’, ‘fur’ and ‘four legs’ features will be such that a strong intensity in both features will result in the class prediction: ‘cat’.

And that is the case with many classical CNN architectures. The final set of layers is often of the fully connected type. Flatten() and sigmoid. This is like bolting an MLP onto…

--

--