Emoji Understanding — An Overview

Sanjaya Wijeratne
Holler Developers
Published in
10 min readFeb 15, 2022

AI Research at Holler Technologies

Image Source — https://www.macrumors.com/2021/12/02/most-popular-emoji-2021/

With the rise of social media, pictographs, commonly referred to as “emoji” have become one of the world’s fastest-growing forms of communication [Parkin, 2016]. First introduced three decades ago, emoji become mainstream after their standardization by The Unicode Consortium in 2009, which enabled emoji adoption on mobile platforms. The rapid growth of emoji began in 2011 when the Apple iPhone added an emoji keyboard to iOS, and again in 2013 when the Android mobile platform started to support emoji on their mobile devices [Dimson, 2015]. Since then, emojis have become an important part of online communication. Today, they are being used billions of times across all major social media platforms to express sentiment, emotion, and sarcasm among other non-verbal cues. Thus, understanding what is expressed through emoji in text messages is extremely important to the modern-day Natural Language Processing (NLP) and Natural Language Understanding (NLU) applications.

As analysis and modeling of written text by natural language processing techniques have enabled important advances such as machine translation [Weaver, 1955], word sense disambiguation [Navigli, 2009], semantic similarity [Gomma and Fahmy, 2013], and search [Guha et al., 2003], the transfer of such methods (or development of new methods) over emoji is only beginning to be explored [Wijeratne et al., 2017a]. Only recently have there been efforts to mimic standard natural language processing techniques used for semantic similarity, sense disambiguation, and search into the realm of emoji. Such emoji understanding tasks are essential to automatically process, derive meaning, and interpret text fused with emoji. In this article, we will look at four emoji understanding tasks, namely, calculating emoji similarity, emoji sense disambiguation, emoji prediction, and identifying the changes in emoji meaning over time. We will go through the past research that has been conducted on these tasks and discuss what is more to be done there.

Calculating Emoji Similarity

Calculating similarity in a vocabulary of words is one of the basic problems found in language understanding research [Gomma and Fahmy, 2013]. Similarly, calculating emoji similarity is one of the basic research problems found in emoji understanding research [Wijeratne, 2018]. The notion of the similarity of two emojis is very broad. One can imagine a similarity measure based on the pixel similarity of emoji pictographs, yet this may not be useful since the pictorial representation of an emoji varies by mobile and computer platform [Miller et al., 2016]. Two similar-looking pictographs may also correspond to emoji with radically different senses (e.g., twelve-thirty 🕧 and six o’clock 🕕, raised hand ✋ and raised back of hand 🤚, etc.). Similarly, one could define the semantic similarity of emoji such that the measure reflects the likeness of their meaning, interpretation, or intended use. We prefer the semantic similarity measures based on the meanings of the emoji. Below, we will briefly go over the past research conducted on this space.

Barbieri et al. were the first ones to study the emoji similarity problem [Barbieri et al, 2016]. They collected a sample of 10 million tweets originating from the USA and trained an emoji embedding model using tweets as the input. Then, using 50 manually-generated emoji pairs annotated by humans for emoji similarity and relatedness, they evaluated how well the learned emoji embeddings align with the human annotations. They reported that the learned emoji embeddings align more closely with the relatedness judgment scores of human annotators than the similarity judgment scores. Eisner et al. used a word embedding model learned over the Google News corpus, applied it to emoji names and keywords extracted from the Unicode Consortium website, and learned an emoji embedding model which they called emoji2vec [Eisner et al., 2016]. Using t-SNE for data visualization, Eisner et al. showed that the high dimensional emoji embeddings learned by emoji2vec could group emoji into clusters based on their similarity. They also showed that their emoji embedding model could outperform Barbieri et al.’s model in a sentiment analysis task. Pohl et al. studied the emoji similarity problem using two methods; one based on the emoji keywords extracted from the Unicode Consortium website and the other based on emoji embeddings learned from a Twitter message corpus [Pohl et al., 2017]. They used the Jaccard Coefficient on the emoji keywords extracted from the Unicode Consortium to find the similarity of two emojis. They evaluated their approach using 90 manually-generated emoji pairs. Wijeratne et al. studied semantic similarity of emoji through embedding models [Wijeratne et al., 2017a] that are learned over machine-readable emoji meanings extracted from the EmojiNet [Wijeratne et al., 2017b] knowledge base. Using emoji descriptions, emoji sense labels, and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, they developed and tested multiple embedding models to measure emoji similarity. To evaluate their work, they created a new dataset called EmoSim508, which assigned human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, they further evaluated their emoji embedding models using a sentiment analysis benchmark and showed that their models outperform Barbieri et al.’s and Eisner et al.’s models, achieving the state-of-the-art results in emoji similarity.

Emoji Sense Disambiguation

Figure 1 — Emoji Usage on Social Media with Multiple Senses

When emoji were first introduced, they were defined with no rigid semantics attached [Unicode FAQ], which allowed people to develop their own use and interpretation. Without rigid semantics attached to them, emoji symbols can take on different meanings based on the context of a message. Thus, for natural language processing and understanding tasks, machines must learn to disambiguate the meaning or the ‘sense’ of an emoji in text messages. For example, consider the three emojis 😂, 🔫, and 💰 and their use in multiple tweets shown in Figure 1. Depending on the context, we see that each of these emojis can take on wildly different meanings. People use the 😂 emoji to mean laughter, happiness, and humor; the 🔫 emoji to discuss killings, shootings, or anger; and the 💰 emoji to express that something is expensive, working hard to earn money, or simply to refer to money. Knowing the meaning of an emoji can significantly enhance applications that study, analyze, and summarize electronic communications. For example, rather than stripping away emoji in a preprocessing step, the sentiment analysis application reported in [Novak et al., 2015] uses emoji to improve its sentiment score. Similarly, emoji can be used to detect sarcasm in text messages [Felbo et al., 2017]. Thus, the task of emoji sense disambiguation, which is the ability to identify the meaning of an emoji in the context of a message in a computational manner [Wijeratne et al., 2016], plays an important role in natural language processing and understanding applications.

Emoji sense disambiguation hasn’t received much attention from the research community as it’s considered one of the difficult problems to solve [Miller et al, 2017] due to many reasons. For example, [Miller et al, 2017] discussed how difficult it is to come to a common consensus on emoji meanings among emoji users on social media. Since the same emoji could appear differently on different mobile and web platforms, when coupled with the issues related to coming to a common consensus with emoji meanings, it is difficult to collect a properly annotated training dataset for supervised learning of emoji meanings. Such issues can be avoided if we had access to a list of possible emoji meanings for each emoji (e.g., an emoji meaning dictionary or a knowledge-base), thus, we can use those meanings as a starting point to tackle the emoji sense disambiguation problem. Based on this idea, an attempt has been made to solve the emoji sense disambiguation problem in a knowledge-based setting using EmojiNet [Wijeratne et al., 2017a]. EmojiNet [Wijeratne et al., 2016, Wijeratne et al., 2017a] is an emoji meaning dictionary consisting of: (i) 12,904 sense labels (emoji meanings) over 2,389 emoji, which were extracted from the web and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News corpus and a Twitter message corpus for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms, specification of the most likely platform-based emoji sense for a selected set of emoji [Download EmojiNet]. To discuss how EmojiNet can be used to solve the emoji sense disambiguation problem, they illustrate disambiguation of the sense of the 🙏 emoji, which is one of the most misunderstood emoji on social media, as it is used in two example tweets.

Tweet 1 : Pray for my family 🙏 God gained an angel today.

Tweet 2 : Hard to win, but we did it man 🙏 Lets celebrate!

EmojiNet lists high five(noun) and pray(verb) as valid senses for the 🙏 emoji. For high five(noun), EmojiNet lists three definitions and for pray(verb), it lists two definitions. Wijeratne et al. take all the words that appear in their corresponding definitions as possible context words that can appear when the corresponding emoji (or sense) is being used in a sentence (tweets 1 and 2 in this case). For each sense, they extract the following sets of words from EmojiNet:

pray(verb) : {worship, thanksgiving, saint, pray, higher, god, confession}

high five(noun) : {palm, high, hand, slide, celebrate, raise, person, five}

To calculate the sense of the emoji in each tweet, they calculate the overlap between the words which appear in the tweet with words appearing with each emoji sense listed above. This method is called the Simplified LESK Algorithm [Vasilescu et al., 2004]. The sense with the highest word overlap is assigned to the emoji at the end of a successful run of the algorithm. We can see that 🙏 emoji in Tweet 1 will be assigned pray(verb) based on the overlap of words {god, pray} with words retrieved from the sense definition of pray(verb) and the same 🙏 emoji in Twee 2 will be assigned high five(noun) based on the overlap of word {celebrate} with words retrieved from the sense definition of high five(noun). The above example only shows the minimal set of words that one could extract from EmojiNet. Since EmojiNet senses are linked with their corresponding BabelNet senses using BabelNet sense IDs, one could easily utilize other resources available in BabelNet such as related WordNet senses, VerbNet senses, Wikipedia pages, etc. to collect an improved set of context words for emoji sense disambiguation tasks. Wijeratne et al. further discussed how this idea can be extended with word embeddings learned from Twitter and Google News text corpora and reported their findings in [Wijeratne et al., 2017a]. We encourage the readers to go through Section 6.2 of [Wijeratne et al., 2017a] for more information on their latest studies.

Emoji Prediction and Semantic Change of Emoji Meanings

Word prediction is a popular, widespread functionality found in several computer and mobile interfaces; however, emoji prediction software was not well developed until recently. Barbieri et al. carried out one of the first studies to address the problem of predicting an emoji given a text input [Barbieri et al., 2017]. Undertaken in a dataset of tweets each containing only one emoji, the research intended to guess the exact emoji used in the tweet given the text. They developed a computational model based on Long Short-term Memory Networks that relied on two different embedding models for representing the input sequence: words and character embedding, the latter to better model noisy social media text. Overall, their neural model outperformed competitive baselines based on word-level information. It is noticeable that the good behavior of word-level information, since word-emoji associations are common in the dataset they used (e.g., word “love” and ❤️), however, considering word sequences offers competitive advantages. They extended their work to study the use of visual information (i.e., images) in combination with text to predict emojis on the Instagram social network [Barbieri et al, 2018a]. The interest in the emoji prediction task is probably evident by the large number of research teams that took part in the first Multilingual Emoji Prediction challenge organized at the SemEval 2018 evaluation forum, where 49 and 22 teams took part in the English and Spanish emoji prediction tasks, respectively [Barbieri et al, 2018b].

Out of the problems that we’ve looked at so far, research on emoji meaning change over time has received very little attention. There’s only one recent study by Robertson et al. on Quantifying Change in Emoji Meaning from 2012–2018 [Robertson et al., 2021] that looks at this issue. They identified five patterns in emoji semantic development and reported that the less abstract an emoji is, the more likely it is to undergo a semantic change. They also provided a web interface to experiment with the semantic meaning changes of any emoji that is found in their dataset [Emoji Semantic Change Over Time]. We encourage the reader to explore [Emoji Semantic Change Over Time] web application to learn more about the changes in emoji meanings over time.

Future Research Directions

We will now look at the possible future research directions along the lines of the four emoji understanding problems we discussed in this article. Out of the four problems, calculating emoji similarity and emoji prediction has received the most attention from the research community. The emoji similarity problem has seen the most promising solutions out of the four problems, however, one could experiment with the recent large language models like BERT and XLNet to learn better emoji embedding representations to improve the performance of the downstream tasks such as sentiment analysis and emoji-based search. The emoji prediction problem has also received a lot of attention from the research community recently, however, due to the challenging nature of the problem, the results reported in the laters papers have great room for improvement. For example, solutions that look at the positional embeddings of emoji along with word and character embeddings could be one promising direction to study as research reports that the position of an emoji in a sentence affects the way it is used and interpreted. Emoji sense disambiguation could also be improved by looking at ways to incorporate platform-specific emoji datasets and meanings into the sense disambiguation process. For example, the recent emoji sense disambiguation results reported by Wijeratne et al. on Twitter data can be further improved by utilizing the source attribute returned by Twitter API with each tweet at the data collection time. The source attribute reports the platform in which the tweet was originated, allowing a computer system to utilize that information in the sense disambiguation process. Similarly, recent language models such as BERT and XLNet could also help to learn better emoji embedding models, which could improve the sense disambiguation process. We hope this article motivates the reader to take on the emoji understanding problems discussed above and extend the state-of-the-art.

--

--

Sanjaya Wijeratne
Holler Developers

Applied Scientist at Nexon America, working on NLP/LLMs. Creator of EmojiNet; Co-organizer of the Emoji Workshop. https://www.linkedin.com/in/sanjayawijeratne/