Introduction to Recurrent Neural Networks

The core of Natural Language Processing

Meghna Asthana PhD MSc DIC
Analytics Vidhya
2 min readMar 5, 2020

--

In this chapter of our Artificial Neural Network introduction series, we will be talking about the Recurrent Neural Networks (RNNs) which are the building blocks for Natural Language Processing (NLP) and Machine Translation technologies. This is due to their ease in comprehending sequential data like sentences and voice snippets.

RNN aims to make predictions on sequential data by utilising memory-based architecture. On contrary to the feed-forward network where no two inputs share the knowledge, RNNs incorporates additional memory states. Therefore, the predictions not only depend on the current state but also the previous information. RNNs find its purpose in a variety of tasks like — translating text to speech, predicting new words in a sentence, converting audio to text, language translation and image/video captioning.

Outline of RNN [1]

A classic RNN architecture contains input, hidden and output layers but unlike feed-forward network, the hidden states are cyclic in nature allowing data to propagate from one time step to another. In order to clearly understand the working mechanism of a RNN architecture, a detailed exploration of an unfolded RNN with one layer is sufficient. The sets of (input-hidden RNN unit-output) as shown in the figure above represent the network at each time step [1]. Following are the events occurring at each time step:

  • At time step t-1, the network encodes a word using word embedding techniques like one-hot encoding, word2vec or GloVe to produce a vector xᵗ⁻¹
  • The encoded input word xᵗ⁻¹ is fed into the RNN cell which produces an output y’ᵗ⁻¹ and memory state hᵗ⁻¹. It should be noted that the memory state is a result of input xᵗ⁻¹ and the previous value of memory state hᵗ⁻².
  • The actual word prediction at time step t-1 is obtained by decoding the output y’ᵗ⁻¹ by comparing it to the text corpus created at the beginning of the training.

This concludes our introduction of Recurrent Neural Networks where we discussed their basic structure and the flow of information through it. In the next chapter of this series, we will go in-depth in how a Recurrent Artificial Neuron works which constitutes our modern-day RNN. (This is rather an advanced topic and I would suggest to skip it if you are not adept with mathematical concepts like matrices and activation functions).

[1] Kostadinov, S. & Safari, an O’Reilly Media Company, 2018. Recurrent Neural Networks with Python Quick Start Guide 1st ed., Packt Publishing.

[2] Julian, D., 2018. Deep learning with Pytorch quick start guide: learn to train and deploy neural network models in Python, Birmingham: Packt.

--

--

Meghna Asthana PhD MSc DIC
Analytics Vidhya

Computer Vision for Earth Observation @ Turing | CEFAS | BAS | NHM | UniCam | NERC