Going beyond traditional Sentiment Analysis Techniques

Where do traditional methods like TF-IDF, CountVectorizer, etc. fail?

Gagandeep Singh
Analytics Vidhya
7 min readJul 8, 2019

--

Sentiment Analysis is still a very challenging task because of challenges like sarcastic feedbacks, subjective feedbacks, objective feedbacks and many more.

First let’s see where methods like TF-IDF, Count Vectorizer or BOW fail.

For training any TF-IDF model, we first remove all redundant words. Then use TF-IDF or Count Vectorizer to encode the words to numbers. These methods don’t care much about the sequence of words. So, where do they fail?

  1. Handling Comparison

These methods fail when we are talking about comparison.

Example — Product XYZ is better than ABC.

2. Handling Negation

Words like not, no and never are very difficult to handle

Example — The product was not bad.

To overcome such issues, we are going to use Word2Vec embedding LSTM in Keras. The code I will be showing is best compatible with Keras because of a predefined function and I promise it will be the easiest way you may have seen.

What makes this article unique is that we are going to use Word2Vec’s Embedding in Keras with a simple function.

So, in brief, these are the things that we are going to cover:

  1. Embeddings and why they are better
  2. Understanding how LSTM works
  3. Understanding the Embedding layer in Keras
  4. Coding sentiment analysis program.

Understanding Word2Vec Model

Lets begin by understanding Word2Vec Model.

The need to convert any text to its numerical form is because computers and by extension, our machine learning models only understand numbers.

Methods like Term Frequency-Inverse Document Frequency(TF-IDF), CountVectorizer or one-hot encoding create a sparse matrix which is not very efficient.

Source-ganto.github.io

To overcome this issue and create a relationship between words, word embedding is used.

Related image
Source-https://www.medium.com/jayeshbahire

Above you can see we have a 4-Dimensional vector for representing words. The greatest advantage of Embedding is that it creates a relation between words

Source:https://mc.ai

This embedding works very well when you are training your network. I will not go into much detail for now. We’ll discuss more when we’ll learn to use embedding in Keras.

Understanding how LSTM works

LSTM stands for Long Short Term Memory. It is an upgraded version of RNN (Recurrent Neural Network) which doesn’t have the problem of vanishing gradient (which happens when there are many hidden layers).

LSTM Cell

There are 4 Gates in an LSTM:

  1. Forget Gate: uses previous LTM to forget what is unnecessary.
  2. Learn Gate: uses previous STM to learn using current i/p(x).
  3. Remember Gate: uses forget gate and learn gate to figure out what to remember for new LTM.
  4. Use Gate: uses LTM, STM and i/p(x) to create new STM which will be further used.

Now, the question is why there is a need to move to LSTM for sentiment analysis?

Long Short-Term networks have a memory in the model. Having a memory in a network is useful because, when dealing with sequenced data such as text, the meaning of a word depends on the context of the previous text. These dependencies have a great influence on the meaning and overall polarity of a document. You can read more about it here.

Understanding Embedding layer in Keras

Embeddings are fixed-length vectors which can be either trained or there are pre-trained embeddings available which can also be used.

Let’s write some code to understand better

I’ve created a Google Colab file. You can either perform all the steps locally on your computer or create a copy of Colab file. The pre-trained embedding we are going to use is big, so I would recommend using Colab.

  1. Download Glove model from nlp.stanform.edu

This pre-trained embedding has 300d vector representation.

2. The next task is to convert Glove Embedding to Word2vec Embedding. It will take a while to convert and then load the file.

Convert Glove Embedding to Word2Vec Embedding

3. Now, let’s see what it can do

most similar words to a dog

Word2Vec locates most similar word to dog based on vector location.

Now, let’s understand how it is useful in Sentiment Analysis.

If the word ‘dog’ and ‘dogs’ occur while training, the model will get to know that they are 85% similarity.

So, why don’t we use a Lemmatizer which will automatically convert dogs to dog? Well, it won’t work for deliver and delivering which are almost similar.

Now, you have seen what embedding can do. We can utilize its power in sentiment analysis.

There are 2 ways to use embedding in Keras.

  1. Create a new array and copy the embedding values from glove_model. This is probably not a good idea because it consumes too much memory.
  2. Use word2vec’s get_keras_embedding function. This function is very easy to use and can be directly used inside Embedding layer.

Code for Sentiment Analysis

Let’s start coding

  1. Open this Google Colab Notebook or use Jupyter Notebook on your local system.
  2. Download the pre-trained glove model.

3. Convert the Glove embedding to Word2Vec Embedding

4. Now, let’s make our dataset ready.

The above package is optional. We will use it to for cleaning data.

4.a) Import all packages

4.b) Download a dataset

4.c) Import training set

4.d) I’m using binary classification here. All ratings above 5 are converted to 1 and below that to 0.

Above step is completely optional

4.e) Convert our labels into sparse-matrix form

5. Finding words occur most(frequency), Replacing it in training set and Padding Sequences.

5.a) max_features- number of unique words that can be present. More is better.

5.b) max_len = maximum length of the sentence. It will be used during padding to pad long shorter sentences.

If you look carefully while fitting on text (tokenizer.fit_on_texts) we are doing some preprocessing. This preprocessing removes all stopwords, numbers, HTML tags and punctuation. This is important because you are selecting important features for your dataset and you don’t want it to be some random words which are of no use.

Why do we do padding of a sentence?

All sentences are not of equal length, this can be a problem. To avoid it all sentences are padded with zeros in the beginning or end (depending on implementation).

6. Building Model

We are using glove_model.get_keras_embedding() to create embedding layer in keras.

7. Let’s compile the model.

Now, wait for the model to train

After training, you can use its predict function to generate predictions

If you have reached this far, Good Job!!

Everything has a flaw

So far we have only talked about the good things about this model and why it outperforms traditional sentiment analysis techniques. Let’s look at a case where it fails

  1. The embedding of word2vec is static. We’ll understand this by an example why it matters to have dynamic embedding.

Example 1 — The robber robbed the bank.

Example 2 — Fishermen are spotted catching fish on the bank of the river.

The word, bank has a different meaning in both sentences. Static embeddings have fixed embeddings and will not change the meaning of word depending on the surrounding words. This is why we need dynamic embedding which can change according to the words around it. You can read more about dynamic embedding here.

Conclusion

Word Embeddings are the way better than TF-IDF and CountVectorizer for sentiment analysis given you have enough data. Training Neural Network with the help of Word Embedding can result in a State-of-the-Art Accuracy. If you are looking for more accuracy in your existing system or model then probably it will be a good idea to shift to Neural Networks.

Thank You!

References-

  1. http://kavita-ganesan.com/gensim-word2vec-tutorial-starter-code/#.XSL483Uzbix
  2. https://machinelearningmastery.com/what-are-word-embeddings/
  3. https://adventuresinmachinelearning.com/keras-lstm-tutorial/
  4. https://skymind.ai/wiki/word2vec

--

--

Gagandeep Singh
Analytics Vidhya

Data Scientist | GENAI | NLP | Chatbot | Docker | Kubernetes