Implementation of RNN’s in Text Classification Tasks

Rahul Vamusani
Jun 16, 2018 · 4 min read

Why RNN’s are good for Text Classification ?

In case of text classification tasks the sequence information is important for interpreting the class of the text. This sequence information is captured by RNN’s. This results in more accurate classification of the text data.

There are many variations of RNN’s likely Many-One, Many-Many, bidirectional, In this case the aim is classify the input text into positive and negative class. So, set of Many-One LSTM units achieves the task as only one value needs to be outputted for determining the polarity of the review.

WorkFlow of RNN’s implementation on Text Data.

Various stages involved:

Stage 1 — Acquiring the data:

There is not much interesting happening at this stage. In this stage the data is present in the form of sentences, each sentence has its label positive(0) and negative(1). The data exists as follows:

Sample Dataset

Here the data column represents the review text of a movie and the label indicates whether the review is 0-positive or 1-negative. Here, It is to be noticed that the review is of arbitrary length. This data will be converted to equal length of sentence vectors in further stages.

Stage 2 — Building Sentence vectors :

This stage is the most important stage in the workflow. In this stage the Text data is converted to numerical sentence vectors. The data that is obtained in Stage 1 is passed to Stage 2. There are several steps in conversion of text to sentence vectors

Step -1: The data is being tokenized to set of words and each word occurrences in the whole text corpus is obtained as seen in the figure the frequency of each corresponding word is noted.

Step-2: The obtained word frequencies are ranked based on the number of occurrences. The word with highest number of occurrences is given highest rank and vice versa. Two words can have same rank, In this case the two different words are treated one and the same. A look up table is constructed based on the rank of each word. This lookup table is used in further stages of the workflow.

Step-3: In this step the sentence vectors are constructed for each of the review in the dataset, this vectors are constructed by referring the look up table for each word that is constructed in previous stage to obtain the rank. There is one point to note, each of the vectors are not same length.

Stage 3 — Building equal Length of Sentence Vectors:

In this stage the vectors are converted to same length by padding with zeros at the starting (pre-padding) to achieve equal length sentence vectors, this helps in better learning process.

Why not train different length sentence vectors?

In this case, the vectors are of different length. This makes the process of Learning weights of arbitrary length difficult. This is a computational hack to speed up the learning process of LSTM recurrent networks.

Here equal length vectors of each sentence are obtained. These sentence vectors are passed to next stage.

Stage 4 — Converting the Sentence Vectors:

In this stage embedding layer is used to convert the sentence vectors matrix into embedded sentence vector matrix as follows:

(number of sentences * length of sentences) → (number of sentences * length of sentence * output dimension)

Here each word with single value in the sentence vector will be converted to a vector.

Why use embedding layers in case of Text data?

  1. Embedding layer is used to learn sentence vectors as the part of deep learning model itself.
  2. It also allows to load pre-trained word embedding model.

Embedding layer is part of data preparation tasks for RNN. After this stage the Word Embeddings and Sentence Labels of the Review are sent to next stage.

Stage 5 — Budilding the model:

Building the Deep Learning Model:

In this stage Deep Learning Model is being built. The model consists of 100 LSTM cells in parallel. Each LSTM Cell accepts the same data.

Each LSTM outputs one value (Many-One). Therefore, 100 values are being outputted. These 100 values are passed to one sigmoid unit. This sigmoid unit gives the polarity of the review (value between 0–1).

Stage 6 — Defining the Parameters to Train the model:

Defining the training parameters:

In this stage the training parameters are defined to train the model with the following parameters:

Metric = “Accuracy”

Accuracy is the metric that measures the correctness of the deep Learning model.

Loss = “Binary CrossEntropy”

As this is a binary classification task, “Binary Cross Entropy” is the appropriate loss measure that is used to calculate the errorness of the model.

Optimizer = “Adam”

Optimizer is the key for minimizing the loss in any deep learning model, Adam is an adaptive learning rate algorithm that minimizes the loss and increase the accuracy of the model.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade