A Visual Representation of Capsule Network Computations
Many discussions of the paper focus on a big picture view of how capsule networks are improvements over standard neural nets: they represent more nuanced part-whole relationships at each layer by using vectors in place of scalars. The idea is that a vector can model the “pose” of an entity, and entities with similar poses belong together. As an analogy: if you see two eyes, a mouth, and nose in a particular spacial relationship and oriented in the same direction, that’s pretty good evidence for a face with the same orientation. Standard convolutional neural nets are capable of modeling similar relationships. But they do so less compactly, with a larger number of parameters or layers (and with less ability to generalize, the paper argues).
To get a better feel for exactly what capsule networks compute, I made a diagram of the capsule-to-capsule connections in the paper. This diagram is intended for those who have read the paper and are looking for a summary reference image.