What will happen when we apply CapsNet with dynamic routing to NLP?

3 min readJul 29, 2018

I am going to tell what researchers found when they applied capsnet for text classification.

Before moving further we need to understand the following layers as well as algorithms.

N-gram Convolutional Layer-This layer is a standard convolutional layer which extracts n-gram features at different positions of a sentence through various convolutional filters.

Primary Capsule Layer- This is the first capsule layer in which the capsules replace the scalar-output feature detectors of CNNs with vector-output capsules to preserve the instantiate parameters such as the local order of words and semantic representations of words.

Convolutional Capsule Layer- In this layer, each capsule is connected only to a local region spatially in the layer below. Those capsules in the region multiply transformation matrices to learn child-parent relationships followed by routing by agreement to produce parent capsules in the layer above.

Fully Connected Capsule Layer -The capsules in the layer below are flattened into a list of capsules and fed into fully connected capsule layer in which capsules are multiplied by transformation matrix followed by routing-by-agreement to produce final capsule and its probability for each category.

Dynamic Routing-The basic idea of dynamic routing is to design a non-linear map. The non-linear map is constructed in an iterative manner ensuring the output of each capsule gets sent to an appropriate parent in the subsequent layer. For each potential parent, the capsule network can increase or decrease the connection strength by dynamic routing, which is more effective than the primitive routing strategies such as max-pooling in CNN that essentially detects whether a feature is present in any position of the text, but loses spatial information about the feature. They explored three strategies to boost the accuracy of routing process by alleviating the disturbance of some noisy capsules: Orphan Category- an additional “orphan” category is added to the network, which can capture the “background” information of the text such as stop words and the words that are unrelated to specific categories, helping the capsule network model the child-parent relationship more efficiently. Adding “orphan” category in the text is more effective than in image since there is no single consistent “background” object in images, while the stop words are consistent in texts such as predicate and pronoun words. Leaky-Softmax-They explored Leaky-Softmax in the place of standard softmax while updating connection strength between the children capsules and their parents. Despite the orphan category in the last capsule layer, we also need a light-weighted method between two consecutive layers to route the noise child capsules to extra dimension without any additional parameters and computation consuming. Coefficients Amendment- They attemped to use the probability of existence of child capsules in the layer below to iteratively amend the connection strength.

Conclusion

Extensive experiments on six text classification benchmarks showed the effectiveness of capsule networks in text classification. More importantly, capsule networks also show significant improvement when transferring single-label to multi-label text classifications over strong baseline methods.

From paper titled “ Investigating Capsule Networks with Dynamic Routing for Text Classification” which was published on 20 june 2018.

What will happen when we apply CapsNet with dynamic routing to NLP?

Written by Madhusudan Verma