Capsule Neural Network (CapsNet)

Kayathiri Mahendrakumaran
Analytics Vidhya
Published in
3 min readDec 26, 2020

Convolution neural networks (CNNs) have played an important role in the area of neural development in recent years. Similarly, other variants of CNN architecture performed well in classification tasks. There were two main drawbacks in using CNN. One is failure to consider spatial hierarchies and lack of rotational invariance. This causes the false positive and false negative to increase. The test data may contain specific features in them, but it will be ignored due to spatial orientation. This results in increasing of false positives. Due to the issue of rotational invariance, there is a high possibility of assigning an object to another or incorrect label. It may results in high false negatives.

If we give images of different orientation and sizes to CNN. Obviously CNN will fail.

Let’s discuss with an example. If we rotate a face and give to CNN where eyes, nose and mouth are switched upside down, the CNN model will not able to identify. But it we swap eyes with mouth and give to CNN, it will recognize it as a face even though it is not a face anymore.

Issues with Pooling

You must need a little bit knowledge on the workflow of CNN in order to understand what the pooling layer is used for. These layers are used in extracting the features of an input image like curves, edges, and color etc. Finally, fully connected layer use this extracted high level features for classification purposes.

How CapsNet Works?

CapsNets, was proposed by Hinton et al., to address the issues mentioned above and it performs better in many fields, specially in Natural Language Processing and image recognition.

CapsNet possess two key features which makes CapsNets to differ from traditional CNN: Dynamic Routing and Reconstruction Regularization. CNN architecture uses dropout functionality to reduce the effect of overfitting. But CapsNet have autoencoders for reconstruction and regularization.

Capsule Neural Network is a form of Artificial Neural Network (ANN) which helps in modeling hierarchical relationships. Here “capsules” like structures have been added to the Convolutional Neural Network (CNN). The output of these capsules can be reused to form a firm or stable representation for other capsules. The final output would be a vector that contains the probabilities for an observation. This output would be similar to the classification approach using CNNs.

The main important fact is that the concept of capsule networks addresses the issues in image recognition, “Picasso problem”. Images may contain all the required parts or portions, but they are not placed in the specific order or accurate position. Consider an example where mouth and eyes are switched in a human face.

Pooling in CapsNet

Capsule Networks don’t have a pooling layer which is used in CNN. Pooling layers reduces the details that come from the higher layer. Therefore, CapsNet does not includes pooling layer in its architecture. CapsNet does not include pooling layer in the architecture, because it may minimize the number of features that are processed in the higher level layers.

Capsules in CapsNet

Capsules are a set of neurons that can be activated for different features like size, hue and position individually. They are a group of neurons that generate an action vector with one component representing each neuron to hold the instantaneous value of that neuron. The capsule network tries to get these from its input dataset.

Training

CapsNet is a supervised learning algorithm. The model is trained in a way where the euclidean distance between the input image and the reconstructed output of capsules.

References

[1] Xi, Edgar, Selina Bing, and Yang Jin. “Capsule network performance on complex data.” arXiv preprint arXiv:1712.03480 (2017).

[2] https://www.frontiersin.org/articles/10.3389/fphar.2019.01631/full

--

--

Kayathiri Mahendrakumaran
Analytics Vidhya

Senior Software Engineer 👨‍💻, WSO2 | Undergraduate👩‍🎓 , Computer Science & Engineering | Writer ✍️