The Recurrent Artificial Neuron

Building Block of Recurrent Neural Network

Published in

Analytics Vidhya

4 min readMar 9, 2020

The previous article in the Introduction to Artificial Neural Network series explained the Recurrent Neural Network (RNN) which constitutes any translation application that we use today. Today, we will be discussing how each neuron of this network works.

The Recurrent Artificial Neuron is the most fundamental part of a RNN structure. It has a connection from the output of the activation layer to its linear input and sums the output back into the input. At any time step, the output with respect to the layer inputs can be given as:

where y(t) is the output vector at time t, x(t) is the input at time t, y⁽ᵗ⁻¹⁾ is the output of previous time step, b is the bias term and Φ is the activation which can be either tanh or ReLU. The input and output have different sets of weights wˣ and wʸ respectively [1].

The above can be computed for the complete layer in a vectorized form

where, Y⁽ᵗ⁾ is the output matrix of size (m, n) with m as the number of instances in the batch and n is the number of units in the layer, X⁽ᵗ⁾ is the input matrix of size (m, i) with i as the number of input features, Wˣ is matrix of input weights of size (i, n) in the current time step and Wʸ is matrix of weights of outputs of the previous step of size (n, n).

At every layer, the final value of the current memory hᵗ and output y’ᵗ are calculated as

The tanh activation function squashes the result between -1 and 1 whereas the softmax is an activation function which computes the probability distribution. The output y’ᵗ is a vector of the same dimension as xᵗ with the sum of all elements as 1. The element with the highest probability value is the next predicted word.

Subsequently, in order to compare the accuracy of prediction, the predicted word y’ᵗ is compared with the actual word yᵗ. This is computed by the loss function, in this case, the cross-entropy loss function described as

The final step in the process is backpropagation, where the algorithm traverses backwards through all time steps to update the weights and biases of the network [2].

A major reason behind RNN’s failure to make good predictions on long and complex sequences is the vanishing/exploding gradients which prevents effective learning. This is because while updating the parameters, the network requires the calculation of the loss function derivatives. Multiple instances of this operation with respect to the same set of parameters may lead to extremely large or small values of derivative leading to undefined weights and biases and in turn, no significant learning. This issue was addressed by the introduction of LSTM networks.

This concludes our section on Recurrent Neural Networks where we reviewed the basic structure of the network and its elementary unit, Recurrent Artificial Neuron. We also explored the reason for the network’s performance drawbacks which can be attributed to its inherent structure. In the coming chapters, we will be presenting an alternative, Long Short-Term Memory networks which address the problem of vanishing/exploding gradients.

[1] Julian, D., 2018. Deep learning with Pytorch quick start guide: learn to train and deploy neural network models in Python, Birmingham: Packt.

[2] Kostadinov, S. & Safari, an O’Reilly Media Company, 2018. Recurrent Neural Networks with Python Quick Start Guide 1st ed., Packt Publishing.

[3] Medium. (2020). Language modelling with Penn Treebank. [online] Available at: https://towardsdatascience.com/language-modelling-with-penn-treebank-64786f641f6 [Accessed 3 Mar. 2020].

The Recurrent Artificial Neuron

Building Block of Recurrent Neural Network

Written by Meghna Asthana PhD MSc DIC