Reverse engineering RNN reveals line attractor dynamics — Read a Paper

Vigneswaran C

Published in

Read a Paper

4 min readOct 15, 2020

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as…

arxiv.org

With the recent trend of not stopping with performance analysis but rather extending it to understand the unfathomable interpretability and dynamics of the deep learning models, the paper talks about the learning dynamics of the Recurrent Neural Networks (RNNs) in dynamical system point of view. The observations put forth in the work are,

Training reduces the dimensions of hidden unit activity

RNN is trained with benchmark sentimental analysis dataset and the hidden unit activities of RNN is collected at each stage. At each stage PCA (Principal Component Analysis) is applied to check the required components to capture the activity. As training proceeded, the number of components of PCA required to explain variance reduced. This shows that, the training model indirectly impose parsimony but also manages to achieve generalization.

State spaces for different classes physically separate

2D histogram
Denoting the density of visited states when training in Yelp reviews dataset (negative -red and positive -green) Source

When training is started, the state vector for different class examples lies very close to each other, and as the training proceeds, they devolve further away. This shows that, the feature extraction to acquire distinction capability is also reflected in the geometry of the state space and that is clearly interpretable in visualization.

Existence of Fixed points

Projection of approximate fixed point onto Principal Components Source

During training, the RNN state vector follows a set of fixed points. In dynamical system point of view, a fixed point acts like a sink, that attracts the system to stay in that particular point. To kick the system out of the point, a sufficiently large force must be applied or the system will remain immutable or move away and rapidly fall back again. In deep learning perspective, the force is attributed to the saliency of the training example. When the data sample is neutral or has less salience (feeble positive or negative sentiment), the model in the fixed point is unaffected. But in the other hand, when a strong positive or negative review (example) is encountered, the model jumps out of the fixed point and settles in other fixed point. This also accounts for the robustness of the deep learning model against noise. And the manifold of all such fixed points lie one dimensional. That gives, another super cool conclusion that, the training model, not only settles in low dimensional state space but also forms 1d manifold of fixed points.

Analysis of fixed point

To find the approximate fixed points numerically, the paper defines a loss function, q = 1/N (Absolute Error(h — F(h, 0)). The function simply has a criterion to minimize the difference of hidden state activity with the one that is obtained when zero is given as input. When the hidden activity, is close to that of F(h,0), we can approximate that the point is fixed, and the function is optimized using optimization techniques. This again is consistent with the definition of fixed point, that is being immune to neutral (zero) and attracting the system quickly. Also, the authors verified whether the point is fixed by giving slight positive/negative example to perturb the system and checked the rate at which it falls back again.

Stability of fixed point

The authors also did stability analysis to check the nature of fixed point. Briefly stating, the first-order Taylor expansion is used to approximate hidden state update to get Jacobian. After eigen decomposition, left and right eigenvectors are obtained, to check the stability of the fixed point.

The experimentation is done across various RNN types, architectures and datasets to prove universality. The parsimony and linear approximation of learning dynamics of RNNs, draw many parallel analogies. One interesting parallel with Neuroscience is, the attractor dynamics that is believed to be observed in hippocampus [Ref 1] (known to play vital role in memory and learning), assumed to use simple low dimensional grid cells to form complex high dimensional representations[Ref 2].

Hope you enjoyed reading :)

References

Tom J. Willsm Colin Lever, Francesca Cacucci, Neil Burgess, John O’Keefe (2005) Attractor Dynamics in the Hippocampal Representation of the Local Environment. Science Vol. 308, Issue 5723, pp. 873–876
Klukas M, Lewis M, Fiete I (2020) Efficient and flexible representation of higher-dimensional cognitive variables with grid cells. PLoS Comput Biol 16(4): e1007796.

Read a Paper

Reverse engineering RNN reveals line attractor dynamics — Read a Paper

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as…

Training reduces the dimensions of hidden unit activity

State spaces for different classes physically separate

Existence of Fixed points

Analysis of fixed point

Stability of fixed point

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Read a Paper

Written by Vigneswaran C

No responses yet