From ANNs (Artificial Neural Networks)to RNNs (Recurrent- Neural Networks)

Ijaz Khan
unpack
Published in
5 min readJan 31, 2021

Artificial Neural Networks (ANNs)

A single neuron (or Perceptron) that has an input and output is called a Logistic Regression. Artificial Neural Networks (ANNs) is a group of multiple neurons at each layer of the network. These ANNs are also called Feedforward Neural networks because they process the input in a forward direction.

ANN or Feed Forward Neural Network

In the figure above, we can see three layers i.e, Input, Hidden and Output layer. Essentially, each layer tries to learn certain weights

  1. Input Layer: takes the data as input
  2. Hidden Layer: Process the Inputs and get features from the data
  3. Output layer: Produces the results

ANNs are mostly used to solve the problems related to Image data, Textual data, and tabular data

Advantages of ANNs

ANNs are popularly known as Universal Function Approximators because of their capability of learning any kind of nonlinear function. In Addition, they have the capacity of learning weights and mapping any input to the outputs.

The activation function is also known as Relu and is one of the main components of ANNs because it introduces the nonlinear properties to the Neural network. Moreover, these activation functions are capable of learning the complexity between input and outputs.

Perceptron

The Animated Gif of the perceptron above shows that the weighted sums of inputs are activated before passing them to the output. If there is no activation function, the network will not be able to learn the complexity but only the linear function that is why is called the powerhouse of the neural network.

Challenges with ANNs:

There are some challenges associated with the ANNs described below.

  1. Image classification can be solved with ANN i.e, by converting 2D images into 1D vectors before the training process. This has the following two drawbacks.
Image Classification
  • If the Image size increases the number of the trainable parameters also increases, for example in the scenario above, a picture of size 224*224 for 4 hidden layers gives 602,112 and that's a huge number.
  • another drawback of converting 2D images into ID vectors is that the image can lose spatial features, i.e, the arrangement of pixels. We will not discuss it here in detail as it is not the scope of this topic.

2. The backpropagation Algorithm sometimes will vanish and explode the gradients. Backpropagation is used for finding the gradients. That is why, in large deep neural networks the gradients will vanish or explodes sometimes as it propagates in a backward direction.

ANN architecture

3. ANN cannot be used for sequential data as it cannot capture useful information in the sequential data which are sequences of words in a sentence.

Feedforward Neural Network or ANNs in PyTorch:
Below is the Pytorch code for a simple Feedforward Neural network. As can be seen that there is one hidden layer used in this example there is one hidden layer, we can use more hidden layers as well, but just remember to put relu below every hidden layer to create the non-linearity.

PyTorch code for Simple Feedforward Neural Network

Recurrent Neural Networks -RNNs

Wikipedia’s definition of RNN is
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
but in simple words, we can say that
a looping constraint on the hidden layer of ANN turns to RNN

RNN : ANN

In the figure above the recurrent nature of RNN can be seen as the hidden state has a recurrent connection which means that the looping constraint will make sure to capture the sequential information of the sequential data.

RNN in Pytorch:

A simple language model in PyTorch which takes three words and predicts the 4th word looks like this. The function forward uses three activations for three words:

Code source https://course.fast.ai/

The code below is the code for RNN which simply applies the loop so we can just give it the length of tokens of our corpus and we do not have to repeat all those steps for each word.

Code source https://course.fast.ai/

For a better understanding of the code visit the link below.

https://colab.research.google.com/github/fastai/fastbook/blob/master/12_nlp_dive.ipynb#scrollTo=_qbdmYcG8u0p

Advantages of RNNs

  • RNNs mostly used for Natural Language Processing (NLP) tasks which is basically a lot of text. Therefore the RNNs need to capture the dependency between words in the sentence or a text corpus while doing prediction.
Parameter Sharing

In the above animation, the outputs O1, O2, O3, O4 depend on each previous word, not just the current word,l which means that the information of each word sequence is captured.

  • RNNs is well known for their parameter sharing qualities which save the computational cost because it shares the parameters at different time steps across all the hidden layers which reduces the number of parameters to train.
Unfolding RNN

In the figure above, there are three weight matrices i.e, U, V, W shared between the layers across the time steps.

Challenges with RNNs

Vanishing and exploding gradient is a very common problem in most neural networks. RNNs also suffer from this problem if the number of time steps is too large.

As you can see here, the computation of the gradient at the last time step vanishes as it reaches the first time step.

For more articles on Machine Learning and Deep learning, please follow and clap for appreciation. Peace!

--

--