CapsNet in action for social media data analysis (Imaginary). generated via DALL-E

Capsule Neural Network (CapsNet): A Forgotten ANN Architecture

mask
3 min readDec 25, 2023

--

CapsNets are artificial neural networks (ANN) that better capture

hierarchical relationships. They’re inspired by biological neural arrangements. CapsNet use “capsules”, which reuse outputs from multiple capsules to stabilize representations for higher-level capsules, to improve CNN performance. Like CNNs for classification with localization, CapsNet generates a vector that includes an observation’s likelihood and pose.

One of CapsNet’s advantages is its ability to solve the “Picasso problem” in image recognition. The problem occurs when the correct parts of an object are present, but they are not arranged correctly in space. In image recognition, CapsNet takes advantage of the linear impact of viewpoint changes on the part/object level, as opposed to the nonlinear impact on the pixel level. This enables more resilient identification and management of spatial discrepancies in object arrangements, similar to rendering an object with multiple components in reverse. [1]

Geoffrey Hinton and his colleagues later introduced a dynamic routing mechanism for capsule networks in 2017. This innovation aimed to reduce MNIST error rates and training set sizes, yielding significantly better results than convolutional neural networks (CNNs), particularly in scenarios involving highly overlapped digits. Each minicolumn in Hinton’s original concept was intended to represent and detect a single multidimensional entity. [2]

But the question here is whether CapsNet is also effective for data other than images, such as text.

Evaluating The Effectiveness of Capsule Neural Network in Toxic Comment Classification using Pre-trained BERT Embeddings [3]

In the dynamic landscape of natural language understanding (NLU) and natural language generation (NLG), Large Language Models (LLMs) have dominated headlines since their debut, overshadowing the once-forgotten Capsule Neural Networks (CapsNet). This project seeks to rekindle interest in CapsNet by revisiting past research and delving into its potential. The study employs CapsNet to classify toxic text, leveraging pre-trained BERT embeddings (bert-base-uncased) on a vast multilingual dataset.

In this experiment, CapsNet takes on the crucial task of categorizing toxic text. Its performance is rigorously compared against other architectures, including DistilBERT, Vanilla Neural Networks (VNN), and Convolutional Neural Networks (CNN). The remarkable outcome reveals that CapsNet achieves an accuracy of 90.44%, showcasing its prowess in handling text data. This result not only underscores the distinct advantages of CapsNet but also suggests promising avenues for further refinement, potentially bringing its performance in line with established models like DistilBERT.

High level CapsNet architecture.

Presented CapsNet architecture. The model takes word IDs as input and employs pre-trained BERT embeddings to extract context from text. A spatial dropout layer is applied to the BERT embeddings to prevent overfitting. The capsule layer receives the modified embeddings and learns to represent the input text as a collection of capsules, where each capsule represents a particular characteristic or attribute of the text. The capsule outputs are then fed into dense layers in order to learn higher-level text representations. The final dense layer generates the output prediction, which indicates the classification or label of the input text.

Source code is available here — https://github.com/TashinAhmed/HATE.

References

[1] https://en.wikipedia.org/wiki/Capsule_neural_network
[2] https://github.com/manuelsh/capsule-networks-pytorch
[3] https://ieeexplore.ieee.org/abstract/document/10322429

--

--