Neural Network for Sentimental Analysis [Part -2: Neural_Network Architecture (in detail)& parameters tuning]

Vedant Dave
8 min readMay 8, 2020

--

[Hey there! Welcome back, If you did not refer “Part-1 [Feature Extration] (just, 7-min. reading) then I advice you to refer it before going through Part-2, will give you a better understanding of Topic idea and data preprocessing step.]

Use my Google Colab Notebook for interactive learning!

In each stage, first I will explain “appllied fundamental/hypothesis”, then relevant “codes with code logic”, and “resultatnt outputs” for unit test models and then apply to Main model. All sequences will maintain until end and seperated by step : X.x (where X=Stage, x= substeps) By the way, I will partitioning this in following 3 stages: [1] Feature Extraction & I.P.-O.P. creation. [2] Model Implementation (NN).[3] Noise Reduction.

Ok, so until now we discuss the idea behind the essence of sentimental analysis, reasons after using Neural_Network, text data preprocessing steps. Now, let’s move to Neural_Network Architecture.

The sequence of phases:

Generate class class Sentimental_Network:and Encapsulate Neural Network architecture with the following methods…

  • Define Data Preprocessing Method phasedef pre_processing_data()
  • Define Network architecture phase. def network_architecture()
  • Training Model phasedef train()
  • Testing method phasedef test()
  • Run command def run()

Step : 2.0 (define class)

Define class and called supportive methods [Continue…]

Python is Object Oriented Programming language, and here its class represents all four basic elements. You should check this Python OOP for more detail. We will use them in our code.

Here, Sentimental_Network is defined class will contain the whole implementation of the network. “__init__” is a reserved method of python to implement other methods inside it. It’s known as a constructor. Here _ or __ is called encapsulation. So, after defining this we can not modify class further from outside of the network. “self” is also reserved for representing instance. well, self.pre_process_data() and self.network_architecture() are supportive method will discuss later below.

Step : 2.1 (pre_processing phase)

Define pre_processing_data() [Continue…]

This method is already called in __init__, and as we already discussed it in [Part-1: Feature Extraction]. Still for a summary, first we generate one set and add populate set with words used in all reviews. The same process for labels and create a set of all POSITIVE & NEGATIVE labels in the same sequence. In the second step we convert that set to a collection of tuples (word, index) to deal with input and output dataset after prediction.

Step : 2.2 (Network_architecture phase)

Again, I will consider every thing in detail so it is lengthy, sorry for that!

Network Architecture [Continue…]
  • With method def network_architecture(self,#input,#output,lr) we create required Input parameters to a neural network. here, as I defined we have no. of input nodes and output nodes. so, first of all, these input nodes generated with input_nodes. We have 74,074 words, so it will generate long tensor of size [1,74,074] for the input layer. other layers will be added during the training phase.
  • Weight_initialization is one of the important parameters, how efficient you initialize your weight defines your network performance.
  • If you use this theorem to hidden_nodes**-0.5then it will give you a better result than output_nodes**-0.5. Because it will work with more neurons of hidden_layer, it helps to train network faster with 0.01 learning_rate than 0.001 (with output_nodes). [please, check it if you want].
  • We will talk learning_rate letter during the execution phase.
  • But, in Output, we just need to classify that “Is this review tagged with positive or negative? “ For that sigmoid function is used.

Sigmoid function:

Sigmoid Activation function (probability — (0.0,1.0))

Here, the sigmoid function is simplified form of “(single element) power of e give the numeric value of the single element and it is divided by power(sum(elements)) of e” will return ratio and its value will be between (0 and 1). If the probability is above 0.5 then it referred to POSITIVE(1) otherwise NEGATIVE(0). [Sigmoid function in detail]

Step: 2.3 (define Training phase)

Neural_Net Training [Continue…]

For training, input and output are mandatory parameters. Here, we must ensure the number of inputs and number of outputs that they must match to each other, otherwise even after training, we should not match them properly and getting errors (“out of bounding [index]” ).

Every neural network works with individual data. In simple classification modeling, NeuralNet actually tries to plot our data in a 2D or 3D graph and classify(divide) them with one line (linear/nonlinear) or plane. For that, it passes through different layers and each time reduced its layer size by making different patterns. During prediction, these patterns are associated with some specific output and the degree of closeness of these patterns to actual output gives us probability and highest probable scenario considered as output. [please read it twice].

Neural_Network working Process.
Neural_Network (Forward and Backwardpropagation)

As shown above,

  • The first input(X) and W_0_1 dot-matrix multiplication give intermediate output as layer_1 and in case of an activation function, it will work as filter as per its formula. then this flow works the same way with new weights of layers. And finally give output (pos/neg) as per value x ( for, x < or > 0.5)
  • When you get actual output then Forward networks complete itself but still when we match our network output with actual targets(real labels) then it gives the error. In starting, errors are quite bigger than our expectations. But, here the main essence of backpropagation works.
  • Backpropagation take total errors and reflects it to layers in reverse order(means distribute total errors in layers and each layer’s individual neurons (according to their weight distribution). In this way now each neuron has a chance to adjust their weights to maximize output probability.
  • Let’s understand it in terms of patterns. In that concern, each neurons’ weight updates their self in a way that a more important pattern’s neuron’s weight will become bigger than others. In such a way the total effect of such important neurons effectively high and give better output.

This way our neural network improve its performance by adjusting weights and biases(here,bias=0), called tuning parameters accordingly.

Step: 2.4 (define the testing phase)

Model Testing [Continue…]

The testing phase only contains forward propagation and by defining run() method we get perfect output. In testing, we just need to run Forward pass (same as training) and compare our achieved output with a real one. Here, besides getting error we can get true or false for our result.

Step: 2.5 Execution Phase:

Model Execution [Demo]

Here, first I define model_0 with proper parameters, I take all data except last 1000 for network defining and training mode. Because testing data always new (unknown) for the network. Here, during execution learning_rate = 0.1 and as I declared before, it is one of hyperparameter for NeuralNet for this we should understand the learning rate in depth.

Learning_rate

In our code, we use the learning rate to update weights. so, according to mathematical modeling, we use it during weight update in backpropagation. It defines how fast our weight adjusts themselves. Let’s understand by example

Example: If our network has previous weight (W_old= 0.5) and now backpropagation advice it to reset by some part of error (let’s take 0.1) that means new weight “W_new” will becomes [0.5 – 0.1 = 0.4] but our formula suggest to multiply it by learning_rate then as per formula same update will work as W_new = [0.5 – (learning_rate= 0.1) * 0.1 = 0.49]. But, what if our learning rate is 0.01? obviously it gives value W_new = [0.5 — (learning_rate= 0.01) * 0.1 = 0.499]. In this way, learning_rate decide the real output.

If you try to think mathematically then its rate of network’s gradient descent which decides the slope of our network’s loss_function that how steep our curve will be. and how fast we reach the lowest point of loss curve(smallest error, … means the likelihood of real value maximization) as below.

Generally, ML practitioner uses its range from 0.001 to 0.9 but my personal choice is to first check 0.01, and try 0.1 and 0.001 and from the graph slope I try to choose learning_rate further. Because when you are going to use (0.0001,0.001,0.01,0.1,1), you are actually working in a specific manner called exponential growth. ( How? >>>1 /0.1 = 10 , 1/0.01 = 100 , 1/0.001 =1000 are factor of 10 with growth of 10²)

Now let’s see the performance of our model with different learning_rate.

Note: In our case performance of model means accuracy and speed of network

Model_0 => learning_rate : 0.1

Training [lr = 0.1]
Testing [lr = 0.1]

Model_1 => learning_rate :0.01

Training [lr = 0.01]
Testing [lr =0.01]

Model_2 => learning_rate : 0.001

Training [lr = 0.001]
Testing [lr =0.001]

Please observe the accuracy and speed of different models and you will get the idea of learning rate perfectly.

As I requested before, this topic is quiet lengthy, so I am trying to seperate it in three parts, give clear understaning. In next part we will discuss [Click here] >> Noise Reduction forSentimental Neural Network , I contains each hypothesis in detail and then we will discuss its effect on our Model trained here.

Thank you for reading. I tried my best still if you have any suggestions. Please let me in a comment. If you like my work, then please show your Sentiments by giving me “clap” and share it with your connections, it helps me to keep me motivated.

The motto of my life: “Keep Learning, Enjoy Empowering”

--

--