Teaching machines irony : Approaches to sarcasm detection

Published in

SIGMA XI VIT

5 min readSep 14, 2023

Sarcasm and irony, the bane of any novice language learner. Chances are, you’ve encountered sarcasm in the wild and have skillfully detected it. Chances are even greater that you’ve had a fair share of “Wait, was that sarcasm?” moments despite extensive experience on the subject.

These nuanced rhetorical devices are used to convey meaning that transcends the literal interpretation of words. Understanding irony requires a complex understanding of tone, contextual clues, and gestures. However, the machine learning models used today to detect sarcasm are generally trained on incredibly large amounts of text, and text lacks the gestures and tonal cues humans use unless directly specified. So, how does one go about teaching a machine how to detect sarcasm with text?

In this blog post, we embark on a journey to explore how natural language processing (NLP) is used to overcome the challenges posed by irony.

What is sarcasm and why it used to be hard to detect:

As we touched on earlier, you’ve probably bumped into sarcasm and irony before and have a pretty good sense of what they’re all about. To save you the google search, sarcasm is defined as, “the use of irony to mock or convey contempt” and irony is, “the expression of one’s meaning by using language that normally signifies the opposite”.

As evident from these definitions, sarcasm relies on the listener/reader to understand the discrepancy between the literal and intended meanings of a statement, causing a sarcastic statement to be ambiguous in and of itself.

Before sarcasm detection systems, when it came to deciphering the meaning of straightforward passages and phrases, sentiment analysis algorithms had (and still have) great performance. They were able to make out whether a sentence has positive meaning or negative meaning with performance on par with humans. However, most of this went out the window when it came to sarcastic sentences and phrases. A sarcastic passage that actually has a negative meaning generally contains words that globally have positive sentiment, and this really threw computers off. Machines also struggled with detecting sarcasm that required some amount of general world knowledge or critical thinking to understand.

Take for example, “Oh, fantastic! Another traffic jam”. A traffic jam is an unpleasant experience, and the statement has negative connotations, but the use of “fantastic” in the sentence gives a globally positive sentiment which used to leave computers confused.

DALL-E’s impression of the sarcasm in “Oh fantastic! Another traffic jam”.

One of the contributing problems to this was the fact that algorithms designed to classify sentiment were generally trained on text. Textual data lacks connotations and tone changes that are fundamental to detecting sarcasm and the issue was compounded by the need of context clues, world knowledge, and cultural nuances to properly detect sarcasm.

Engineers looked at this, gave it some thought, and decided that the solution was more data. It wasn’t that textual data was bad, but that there wasn’t enough of it. With great amounts of data comes great amounts of context, world knowledge, and cultural nuances.

Deep learning models are great at identifying patterns, and with enough data, these models can learn to understand linguistic and pragmatic features that indicate sarcasm without being told about said features. And guess what? The internet is a goldmine full of the data they need.

To make the most of this enormous amount of data, models require preprocessing and various methods of preprocessing these large datasets are used in conjunction to break down data into forms that are digestible by machine learning and deep learning algorithms.

Making the data digestible :

The preprocessing of data before it’s given to a machine learning or deep learning model to train, or test has multiple stages. The data must first be cleaned, tokenized, and stemmed.

The cleaning step constitutes the removal of any unnecessary characters and words that don’t contribute to the meaning of the text. The text is then split into tokens (words) to aid in the model’s understanding of the structure of the text, and finally all the tokens are reduced to their roots (stemmed or lemmatized) to unify similar words.

For example, “thinking”, “thought”, “thinks” are all reduced to their root, “think”.

Differences between stemming and lemmatization

Following this, features are extracted from data. That is, the data is split into measurable properties using which models make classifications. Various features are used in this process. Lexical features are the most common and straightforward. These consist of unique words, phrases, noun phrases, and even named entities that contribute to the sentiment of a passage.

Stemmed features are used to obtain another perspective on the data. They operate on the concept that words with similar roots tend to convey similar meanings. This helps in reducing the number of distinct words, which simplifies the analysis. Symbolic texts like emoticons and emojis are referred to as pragmatic features and these are powerful indicators of sarcasm. Additionally, frequency-related features reflect the significance of particular words within a passage. Using this, words can be classified as high-frequency or content words and labelled as such.

In short, different sections and phrases in our text that capture the semantic relationships between words based on context are identified and labelled. These along with punctuation, interjection, quotes, emojis, and the frequency of words are noted and converted to numerical representations called word embeddings to enable models to detect patterns and work with data in more meaningful ways (as computers work with numbers not words).

Once data is preprocessed and features are extracted, the features and data are fed into models that use them to learn how to detect sarcasm. Among the various approaches tried to detect sarcasm, deep learning models have been the most successful with up to 95% accuracy, and even among these, transformer models like BERT and GPT take the cake.

Thus, snarky comments and sarcasm are detected and machines come closer to understanding comedy.

Conclusion :

The applications of sarcasm detection extend beyond recognizing clever wordplay. It has immense potential in revealing what the public truly thinks of products, services, companies, politicians, and even celebrities. Sarcasm detection could help us tell the difference between a dark joke and a subtle cry for help. With more accurate models, we could be able to tell the difference between legitimate threats and jokes that have gone too far.

Teaching machines to grasp the nuances of sarcasm is a journey in which we have to look within to understand what truly constitutes “sarcasm”. Detecting these rhetorical devices requires reading between the lines and realizing that there exist meanings beyond the literal interpretation of words (or we just give the models more data).

References :

https://spectrum.ieee.org/sentiment-analysis

https://anebz.eu/sarcasm-detection-nlp

https://www.emerald.com/insight/content/doi/10.1108/IDD-01-2023-0002/full/html

https://ieeexplore.ieee.org/abstract/document/9404585

https://journals.sagepub.com/doi/full/10.1177/1470785320921779

Teaching machines irony : Approaches to sarcasm detection

What is sarcasm and why it used to be hard to detect:

Making the data digestible :

Conclusion :

References :

Written by Gouri Shanker