Neural Network for Sentimental Analysis [Part -1: Feature Extraction]

Vedant Dave
7 min readMay 8, 2020

--

Use my Google Colab Notebook for interactive learning!

How are you?

What is Sentimental Analysis?

Every day, whatever we do always linked with our Emotions. In each condition, we all have some specific attitudes towards the event. Way of representing such emotions may be different, but our reaction always tells some story.

Applying the analytical methodology to determine an individual’s opinion and categorize it to specific emotion (positive, negative, neutral, happy, sad, angry, humble etc…) So, in summary, “Sentiment analysis is the interpretation and classification of emotions.”

Applications:

  • Natural language processing (NLP), text analysis, computational linguistics Analysis.
  • For security reasons such as biometrics to systematically identify, extract, quantify, and study affective states and subjective information.
  • Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine and many more…

Now, let’s come to the point that how recent development of Machine-learning can help us to “dig gold out of the trash”(row dataset).

My main focus of this topic is to discuss neural networks in depth. but still, let’s start with methods commonly used for Sentimental Analysis.

Why Neural Network?

Normally, we see large use of the NLTK library to preprocessing text mining data sets and then direct application of classifier to get output. But, nowadays the scenario is changed…

In the world of big data, everyone has 5 V problems when we practically deal with data. One of them is the Velocity of DATA. means “The Data generation speed is higher than our Data analytic capability, in terms of hardware, human resource, analytic methodologies, and algorithmic limitation.”

To extend our limitations, we use the neural network which can able to find our complex combination of text data using the RNN/LSTM model. Even if we have data in terms of face detection (as first product review) or videography still we can find the best approach using CNN.

Well, my main motto of this article is to give depth understanding of fundamentals, preprocessing (for neural network models), Network architecture explaination and applying data Noise reduction concepts.

Commence:

Here, I will use the review and label data in the form of a text file (.txt). You can download the dataset and my google Colab Notebook for an interactive tutorial. then just run following code:

Load Data
Preview Nature of Data

I represent each and every step in detail so, please forgive me for lengthy reading. But, I am sure that you will understand the whole application in detail.

In each stage, first I will explain “appllied fundamental/hypothesis”, then relevant “codes with code logic”, and “resultatnt outputs” for unit test models and then apply to Main model. All sequences will maintain until end and seperated by step : X.x (where X=Stage, x= substeps) By the way, I will partitioning this in following 3 stages: [1] Feature Extraction & I.P.-O.P. creation. [2] Model Architecture(NN). [3] Noise Reduction.

Stage_1 : Feature Extraction & IP/OP creation :

Features are an inherent property of an object which tell us how they differ from others as an Example:

  1. “I am Happy”
  2. “I am sad”
  • In, both Happy and Sad are two different words that tell us the sentiment of a writer. So, for any text, their words are the ultimate representatives.
  • That means if we separate words from the text and generate total_counts that can create one vector of words as input.
  • In python .split(“ “) function will separate our words as they are separated in a text by blanks.

Step: 1.0

Generate Counters and Label Separators

I use Counters to count positive, negative, and total_words within their relevant labels. This will separate relevant words to identify the nature of the specific word represents in distinctive labels. And as you see below we get relevant words with their counts. Its collection of tuples in the form of (word, counts)

.2

Step: 1.1

But, from the above result we should not find meaningful information that can lead us to some conclusion, the words “the”, “and”, “a”, “of”, “to” are used a lot more time but do not relate to the sentiment of a writer so it's better to find ‘pos_neg_ratio” than this to reduce the word overlapping in each counter.

  • It will reduce computational and time complexity.
  • The positive-to-negative ratio for a given word can be calculated with positive_counts[word] / float(negative_counts[word]+1). Notice the +1 in the denominator – ensures, we don't divide by zero in case if words are only seen in positive reviews.
  • we will only consider words which are more than 100 times in total, its one kind of hypothesis which helps us to reduce words of product, brand name, or type of products.

Result Interpretation: Here, our ration is positive to negative means if a word is in the positive label then, it has ration value >1 and for negative terms value became <1 (near to zero). Common words are around 1. You can find that in code result.

Step: 1.2

Now. our problem is the range of our pos_neg_ratio which is from +14 to 0.008. This will give us a large data distribution range and will make analysis hard in graphical representations. And, Larger digits of data will make the Neural networks hard to train and use more computation power. So here if we can transfer it to a small range with math then it's better for our model.

log(n) is one of them because log(2¹⁰⁰⁰⁰⁰⁰) ~= 20.00 (19.9315685693) so it will be better distribution band for us. [click here for great Answers].

Observation: Matthau : current value = 2.80 reduced from 16.55 (previous value)

Great!, now our range is from (+2.80 to -3.32) reduced from (16.55 to 0.02) and another advantage is distribution of data with reference to Y-axis cut X-axis at zero will give us levarage of better distribution, classification and visualization modeling.

INPUT / OUTPUT DATA CREATION:

For, this we must understand what is input and output in terms of Neural_Network?

Simple FeedForward Neural-Network

It is, understandable that our words are input as we identified them as feature having power of deciding Writer’s Sentiment and Our output must be either Positive/Negative review. Means we must need softmax activation function to decide output in term of probability. If the network considers word probability (>0.5) represent Positive Label and (<0.5) as Negative Label. (I will discuss this in detail during the Network Architecture stage. So, Let’s decide on Input_layer (layer_0).

Step : 1.3 (Input generation) :

Here, our measures are to create an input layer (layer_0) and use it to update layer_1(hidden_layer).

Interpretation: First we first create one set of total words and generate indexing to update information in layer_0. It's obvious that layer_0 should be and input length of our set_size (1,74074).

Step: 1.4 (Layer update):

Array[* * *] represents word repetition so the First word is repeated 18 times.

First, I use layer_0 *= 0 to reset the layer and then update it with for loop. layer_0[0][word2Index[word]] +=1 counts each word of review and increment it for each repetition of the word. So, we get a real weightage of a word in each label. As an example, in a specific review “if customers use word satisfied or happy” 3 times then it increases its weight to output and will give a better result.

Step : 1.4 (Output) :

Neural Network uses mathematical models and gives output in Numeric form. so, it's better to convert our output labels as 1 and 0 for “POSITIVE” and “NEGATIVE”.

Ouput [Label Conversion ]

So, here we completed our first milestone (Feature extraction and preprocessing stage) of Given Dataset. Our Next stage is Neural Network building.

As I requested before, this topic is quiet lengthy, so I am trying to seperate it in three parts, give clear understaning. In next part we will discuss [Click here] >> Architecture of Neural_Network and parameter tuning and its effect on Model performance , I also contain phenomena such as underfiting & overfiting in Machine Learning.

Thank you for reading. I tried my best still if you have any suggestions. Please let me in a comment. If you like my work, then please show your Sentiments by giving me “clap” and share it with your connections, it helps me to keep me motivated.

The motto of my life: “Keep Learning, Enjoy Empowering”

--

--