Is that text SARCASTIC … ?🤔

Rishit Toteja
Analytics Vidhya
Published in
4 min readSep 10, 2021

As we all know, Artificial Intelligence and Machine Learning are transforming the world. It has numerous applications in various fields, from medical science to video games.

Areas such as E-Commerce and Social media have used AI lucratively and have benefited the most. (AI) the market is expected to grow to126 Billion dollars by the year 2026.

However, in this project, I decided to use AI for a fun task. I tried to build a model to detect sarcasm in a text.

SOURCE : Google Images

It may sound difficult to believe, but with new research and improvements being made every day in AI, it’s possible to build this model. I was able to create a model to detect sarcasm in a text with 83 % accuracy.

Before starting to explain the building of the model, let’s try to detect sarcasm in some famous movies/web-series lines:

Movie : Deadpool 2

2.

Big Bang Theory (S04 E24)

EXPLAINING THE MODEL :

1. Libraries and Dataset Used :

The whole project was made in python language and carried out on google colab environment. For designing the neural network, Keras sequential API was used with TensorFlow as the backend.

For Visualizing Data and Data Analysis: pandas, matplotlib libraries were used.

The dataset was available on an open-source website, kaggle. The dataset was in .json format. So, json and requests libraries of python were used to fetch the dataset for the model.

2. Tokenizing and Padding the text data

For the model to make predictions, it needs to have the text data in numerical format. Thus we need to pre-process the data through tokenization and padding.
Tokenization was done easily by creating an instance of the Tokenizer class in TensorFlow. To convert the raw text into sequences of numbers, I used the texts_to_sequences method on the tokenizer.
After converting all the text strings into sequences, padding was performed to make all the training texts of uniform length.

3. Building a Long-Short Term Memoery Nueral Network :

Long Short-Term Memory Neural Networks or LSTM are a class of neural networks, which are most commonly used for sequence prediction problems.

The model built was sequential, which included; an embedding layer, a BiDirectional LSTM layer, a dense layer with 24 neurons and ‘relu’ activation, and an output layer with sigmoid activation.

Since our task was binary classification, i.e., to predict whether the text was sarcastic or not, I used a sigmoid activation with one neuron on the final output layer, and ‘binary crossentropy’ was used for calculating loss.

For performing gradient descent on the neural network, I used Adam optimizer.

4. Evaluating the model

Visualisation :

To visualize the accuracy and losses on the training and validation dataset after each epoch, I plotted them with the help of the matplotlib library.

Finally, the model was also evaluated on the test set, i.e., on texts which it had never seen before, and the accuracy was found out to be 81.71 %

Now let’s test our model on some other famous movies or web-series dialogues :

1)

Avengers : Age of Ultron

2)

Friends (S06 E05)

3)

The Office (S09 E23)

4) Eminem : “Music to be Murdered By SIDE B” — Alfred’s Theme

Lyrics — “So call me Santa Clause, cause at the present, I out-rap ’em all”

REFERENCES :

  1. Google Colab Project : https://colab.research.google.com/drive/1knz8La6hkzkq_qssz9JxM-Tzuk_v8UNX?usp=sharing
  2. Dataset : https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection

--

--

Rishit Toteja
Analytics Vidhya

Hi there, I am Rishit Toteja. I have profound knowledge about Deep Learning, Data Sceince and Python. I have keen interest in Electronics. I love to play chess