Emoji-nomics: Quantifying Emoji Sentiment in Financial Discourse on Twitter

Published in

Nerd For Tech

16 min readMay 24, 2023

Emojis have become an essential part of modern communication, as millions of people use them on a daily basis to express their thoughts and feelings. While emojis were originally designed as a simple tool to add emotion to text messages, they have evolved into a complex system of symbols that convey a wide range of feelings and tones. In recent years, the use of emojis has exploded on social media platforms like Twitter. While many researchers have turned to social media to analyze text sentiment using Natural Language Processing, emojis have been overlooked due to little research and the lack of available emoji sentiment lexicons.

At the same time, the availability of online brokerages like Fidelity, Robinhood, E-Trade, and TD Ameritrade have made it easier than ever for average people to invest in the stock market. This has led to a surge in retail investors, who do research and communicate their trading positions on social media sites, such as Wall Street Bets on Reddit and other forums. These investors have established their presence on Twitter to express their thoughts about the stock market and use emojis to convey their feelings and opinions about certain stocks and investments. For example, someone might use the 📈 emoji to indicate that they think a particular stock is going to rise in value or use the 💸 emoji to indicate that they believe there is a good opportunity to make money from a particular investment. The financial community has even created its own vocabulary of emojis, with certain emojis taking on specific meanings. For example, the 💎 emoji is often used to indicate that a user has “diamond hands,” meaning that they are holding onto an investment and are not selling their position. For more information on retail investor lingo, check out Stash’s Guide to Reddit Wall Street Bets Emoji Lingo.

Despite the growing prevalence of emojis in financial discussion, there has been limited research on their usage and significance in this context. I aimed to address this gap by scraping over 750,000 financial tweets to better understand the sentiments associated with emojis and using natural language processing techniques to identify the sentiment expressed in a financial tweet, such as whether it is positive, negative or neutral. By analyzing large volumes of tweets related to a specific stock, market or industry, it is possible to identify patterns or trends in sentiment that can provide insights into market sentiment and potentially predict market movements.

For this specific study, the main areas of interest were:

1. What emojis are most used on Twitter when talking about the stock market and what sentiments are associated with them?

2. Which sentiment estimates for emojis are significant?

3. What sentiment do “Finance-Specific” emojis carry?

Methods

Data for the analysis was collected through Twitter’s API. Tweets that contained tickers of 200 stocks with the largest market capitalization on the NASDAQ exchange were collected. A stock’s market capitalization is a measure of a company’s size. The data was filtered by market capitalization because companies with the highest market capitalization are likely the most widely followed and most discussed on Twitter. This includes tech giants like Apple ($AAPL), Microsoft ($MSFT), Netflix ($NFLX), Tesla ($TSLA) and others. These large companies are widely followed and discussed, so focusing on tweets relating to these stocks would give a sense of how emojis are used in these discussions.

Tweets were then pulled from Twitter using the {rtweet} package in R. To ensure that I was collecting a diverse and non-duplicative dataset, I waited at least 10 days between each data pull of tweets. The first collection of tweets was pulled on June 13th, 2022, and additional tweets were added on June 26th, September 7th, September 16th, November 7th, November 15th, November 30th, and December 20th. In total, 755,710 tweets were pulled in the analysis.

By collecting tweets over a course of several months, I was able to capture a wide range of discussions about the stock market and specific stocks over different financial periods and fluctuations. For example, the data collected in June might have included discussions about inflation being at its peak (9.1%), while data collected in late November and early December might have included discussions about the crypto market implosion — with Bitcoin down 75% over the prior year and the FTX scandal that rocked the cryptocurrency market.

Example code for filtering the NASDAQ stocks and pulling tweets using the {rtweet} library is provided below:

# https://www.nasdaq.com/market-activity/stocks/screener
nasdaq_screener <- read.csv("nasdaq_screener.csv")
nasdaq <- nasdaq_screener %>%
  arrange(-Market.Cap) %>%
  .[1:200, ] %>%
  select(Symbol) %>%
  mutate(Symbol = paste("$", Symbol, sep = ""))
 
newtweets <- data.frame()
 
for(i in 1:200){
  pull <- search_tweets(q=nasdaq$Symbol[i],
                        n=25000,
                        include_rts= FALSE,
                        retryonratelimit=TRUE
  )
  
  pull <- pull %>% mutate(Symbol=nasdaq$Symbol[i])
  newtweets <- rbind(newtweets, pull)
}

Data Cleaning and Wrangling

Pre-processing was conducted on all tweets to allow them to be analyzed. The {rtweet} package in R has a built-in data frame from Unicode.org called `emojis` that, for 2622 common emojis, includes an image and a description of what the emoji is:

I replaced spaces in the emoji description with hyphens and I added the word “emoji” to the end of the description so that each emoji could be read as a single token. I subsequently added a loop and a new column in the tweets data frame so that each emoji was replaced with its text equivalent.

Next, I filtered out tweets that did not contain an emoji from the dataset. I removed any duplicate tweets based on status_id (a special numeric identifier for each tweet). I replaced all instances of punctuation with a space and then collapsed all extra white space in a tweet. I kept the “-“ because it is used in the emojis. Finally, the text was converted into lowercase. On October 14th 2022, the {rtweet} library changed its documentation so that the column status_id was changed to id_str. For that reason, the third line of code moves any id_str value to the status_id column so that all of the numeric identifiers could be found in the same column. Example code is provided below:

## create a new dataset with tweets that contain emojis
tweets_emojis <- tweets %>%
## keep only rows where textclean column contains "-emoji"
filter(grepl("-emoji", textclean) == TRUE) %>%
## replace NA values in status_id with corresponding id_str value
mutate(status_id = ifelse(is.na(status_id)==T, id_str, status_id)) %>%
## remove duplicate rows based on status_id, keeping all columns
distinct(status_id, .keep_all = TRUE) %>%
## replace certain characters in textclean with a space, remove extra white space, and convert text to lower case
mutate(textclean = str_replace_all(textclean, "\\.|\\?|\\!|,", " ") %>%
str_squish(), ## we need to keep the "-" for the emojis
textclean = str_to_lower(textclean))

Once the tweets were filtered to include at least one emoji, the data frame of 755,710 tweets shrunk down to only 98,191 tweets (or 13% of the original 755,710 tweets).

Afinn Sentiment Library

One area of investigation in this study is analyzing the sentiment of emojis in these 98,191 tweets. Sentiment analysis is the process of taking a piece of text, in this case the text of a financial tweet, and assigning the words a numeric value in order to determine its emotion and sentiment.

I chose the afinn lexicon developed by Finn Årup Nielsen as the library that I would use to join the tweets and assign sentiment scores to them. This library contains more than three thousand “sentimentally charged” words and has pre-trained sentiment scores for a wide range of words. After loading the afinn lexicon from the {tidytext} package, I used the get_sentiments function to retrieve the sentiment scores for the words in the dictionary. As briefly explained above, the {afinn} library contains over 3,300 “sentimentally charged” words that have a score between -5 and +5 where -5 is very negative and +5 is very positive. Note: words that were not sentimentally charged were left as NA and not “0” because I did not want the averages of the sentiment in the tweets to be artificially weighed down especially for longer tweets with more “non-sentimentally” charged or “0” charged words.

To determine the sentiment of a tweet, it is necessary to tokenize every word, assign them a sentiment score from the {afinn} library, calculate the average sentiment, and finally derive a single numerical value.

For example, if a tweet read “Just sold my $AAPL stock. Made a good profit, but feeling a bit nervous about missing out on future gains,” the string would be broken into tokens:

Then sentiment scores would be matched to each word using the Afinn library). Finally, the averaged sentiment score would be found (for those curious, the sentiment score was 0.4 on a scale from -5 to 5 where a positive number means the sentiment is positive).

Now, what if the tweet read “Just sold my $AAPL stock 🍎. Made a good profit, but feeling a bit nervous about missing out on future gains 📈😬”

Finding sentiment for these emojis is more complex than traditional tweet analysis. There is no sentiment library like {afinn} that assigns a number to each emoji. And, emojis cannot be standardized into a single number because emojis often have multiple meanings depending on context. Accordingly, in this study I create sentiment estimates for emojis using a linear regression so that emojis have numerical values to represent their sentiment, in the specific context of finance on social media. Understanding how emojis are used to convey emotions in this specific lens of the finance community is very valuable for market participants to make informed decisions, but no research has been done in this area.

To find sentiment for emojis, I used a linear regression model so that the emojis would be predictors and the sentiment of the tweets would be the outcome variable. In other words, sentiment associated with each emoji would be predicted based on the overall sentiment of the tweets it is included in — so existing sentiment literature [with words] could be utilized in the realm of emoji sentiment analysis.

Preparing the Linear Regression Model

To prepare the linear regression model:

A new column needed to store only the text of the tweets with all emojis removed
Sentiment Scores needed to be calculated for the tweets in this new column
Condense the data to only the key pieces of information: the original tweets with emojis, the tweets without emojis and the sentiment scores for the tweets without emojis
Add several count variables for how many times each emoji appeared in a tweet

As outlined above, I created a new column in the dataset for the tweets with emojis removed. This was performed using a simple loop (that took out all instances of the `-emoji` string that was attached to each emoji’s description in the Data Cleaning and Wrangling section). The loop can be found in sample code below.

tweets_emojis$text_no_emoji <- tweets_emojis$textclean ## create a new column called text_no_emoji and, for now, make it the cleaned version of each tweet
 
for(i in emojis$description){
  tweets_emojis$text_no_emoji <- str_remove_all(tweets_emojis$text_no_emoji, i)
} ## runs a loop that takes out all emoji descriptions from the text

The unnest_token function from the {tidytext} library was used and all words were joined to {afinn} and assigned a sentiment score. Then, average sentiment scores across tweets were calculated. Sample code for this tokenization process can be found below:

tokens <- tweets_emojis %>%
  select(status_id, text_no_emoji) %>%
  unnest_tokens(input="text_no_emoji", "word", token="words") %>%
  left_join(afinn, by="word") %>%
  group_by(status_id) %>%
  dplyr::summarize(sent_mean = mean(value, na.rm=T)) %>%
  mutate(sent_mean = ifelse(is.nan(sent_mean)==T, 0, sent_mean))

To add several count variables for how many times each emoji appeared in a given tweet,

I filtered out all 755 emojis that were not used in the analysis. Only emoji’s that appeared in at least 0.1% of all tweets were included to ensure a sufficient sample size of each emoji when building the model. After this filtering was applied, I found that only 306 of the 2622 emojis (11.7%) met this requirement.

The Linear Regression Model

The linear regression model was run with the selected emojis as predictor variables and the sentiment of each tweet as the outcome variable. This allows for an association to be made between each tweet’s sentiment and each emoji. The assumption is that emojis used in these tweets reflect the tone of the tweets. Therefore, any implicit sarcasm, irony or humor in emoji usage cannot be detected. Like in many NLP cases, figurative language is very difficult to detect.

Once the linear regression model was implemented, coefficients for each emoji were found. The sentiment estimates were found for all emojis as well as Standard Errors, t-values and p-values.

A full table of these values were created. Here is a preview of the first ten emojis:

Results

The most surprising finding was that a majority of finance-related emojis differed in sentiment from their canonical meaning. How we use them over text and on Twitter is fundamentally different from their use within a financial context.

Interesting trends were found in the identification of:

emojis with the most extreme sentiment estimates (both positive and negative),
emojis with the highest usage, and
emojis with specific definitions/meanings in the finance community.

The plot above shows the sentiment estimates for the top five most positive and most negative emojis. The bars represent the 95% confidence intervals for each of the emojis. According to the linear regression, the emoji with the most negative sentiment was the 📕, with a sentiment estimate of -2.7. It appeared 420 times in tweets, and only five tweets that included the 📕 emoji had positive sentiment. The 8️⃣ emoji had the second lowest estimate at -1.49, but because its 95% confidence interval is very wide, its p-value of 0.085 is not statistically significant at the conventional alpha level of 0.05.

The beating-heart-emoji, 💓, had the most positive sentiment estimate out of all emojis with an estimate of 1.24, but it should be noted that it only appeared 132 times so it had a small sample size. Nonetheless, the heart emoji usually conveys a sense of love or affection, which explains its high sentiment estimate. While 💓 has a positive sentiment estimates, it is interesting that other heart-related emojis do not follow the same trend. In the table below, all five heart emojis that met the 0.01% threshold are provided with estimates for sentiment. As shown, the green-heart-emoji, red-heart-emoji and blue-heart-emoji all have negative sentiments which is quite counterintuitive so the beating emoji stands alone in being positive in the financial context.

Next, I analyzed the top ten most frequently used emojis to examine their sentiment estimates and some surprising findings were revealed. The 🚀 emoji was the most popular with 35,708 uses/appearances, followed by 📈, 📉, and 🔥. Interestingly, the chart-decreasing emoji 📉, had a positive sentiment of 0.27, even higher than the 📈’s emoji score of 0.19. This is counterintuitive and there is no clear explanation for such a phenomenon. These findings are further evidence that emoji’s as used in the financial context are much different than their traditional meaning.

The 🔥, which is the fourth most used emoji, has a negative sentiment score of -0.09. This is unusual because it is colloquially known to signify something positive, exciting or ‘or fire.’ It would be expected that this emoji would have a positive sentiment, but the results of the regression actually show the opposite trend. Although 🔥 is usually used to signify excitement, it is possible that it has a different connotation in the financial community. For example, it might represent market volatility or the rapid decline of a security, much like being “on fire” in a bad way. Thus, the fire emoji can encapsulate a wide range of emotions from excitement and financial prospect or danger of a collapsing stock. Take a look at the following tweet by Wall Street Gold, which uses the fire emoji in a very negative way to convey the message that bank stocks are burning down and crashing.

From these examples, it is evident that the emojis used in financial discourse have different meanings than otherwise used on twitter and that the finance community on Twitter has its own unique set of jargon and conventions that may not align with the conventional use of emojis in everyday communication. This finding highlights the need for additional research of emojis in this realm. Just as the 🔥 no longer means 🔥, the 🚀 does not have anything to do with actual rockets or outer space.

Finance-Specific Emojis

Given that certain words and phrases have context-specific jargon depending, a dictionary of finance specific terms provided by Stash’s Guide to Reddit Wall Street Bets Emoji Lingo is utilized in this analysis. The entry on the 🚀 emoji, for example, explains that it “[indicates] which stocks they’re hoping to ‘send to the moon,’ or quickly increase in stock price.”

A flag was added to the results of the regression that would label each emoji as either finance-specific or not. This identification was subjective, but was compiled after consulting with Stash’s Guide, Emojipedia, Hyemoji, Reddit’s Wall Street Bets forum, Stably’s article Understanding Whales, Bulls & Bears, Investopedia and Toby Mathis’ article The Ultimate Guide to Reddit’s Wallstreetbets Slang about the lingo and emoji usage. The following chart contains the fifteen emojis I deemed as “finance-specific” and includes definitions for each.

These emojis’ sentiment estimates were plotted on the graph below, where green values represent statistically significant estimates:

Sentiment Estimate Intervals of Financial-Related Emojis

The gorilla-emoji can refer to “apes,” which are retail investors who feel ‘bullish’ about a stock expected to increase in price. The fact that this emoji has the most positive sentiment score (1.12) with a statistically significant estimate, suggests that the gorilla-emoji is associated with optimism. The use of this emoji may be particularly effective in communicating how retail investors feel, and may serve as a signal to other investors on social media who are also considering investing in that stock. So, a gorilla-emoji infers that people like a stock and view it positively.

It is also interesting to note the stark contrast between the “whale-emoji” (🐋) and the “spouting-whale-emoji” (🐳). Both emojis have statistically significant estimates and presumably the same definition in the finance space: whales refer to individuals or institutions who have enough money and power to influence the price of a cryptocurrency or stock in a negative way. The whale-emoji has a sentiment score of 0.80 while the spouting-whale-emoji has the most negative sentiment score of -1.176 in the graph. After further investigation, it seems that the spouting-whale-emoji is specifically tied to a popular finance twitter account called Unusual Whales. They publish news about the stock market and often include stories that are negative, such as the following that was scraped in the data:

This twitter account uses the spouting-whale-emoji in almost all tweets and words such as “worst,” “challenges,” “ends,” “reverse,” “not,” and “dip,” all of which have negative sentiments. This discovery explains why the spouting-whale-emoji has such an usually negative sentiment score. It is likely that its association with the Unusual Whales account and their negative news has influenced the use and sentiment of the emoji.

Three emojis with specific finance related meanings, the 💎, the 🤑, and the 💰, do not have statistically significant sentiment estimates. This means that there is not enough evidence to support the notion that these emojis carry a specific sentiment related to finance. Sentiments associated with them are likely too neutral or ambiguous to be captured by the regression model. While these three emojis may not carry significant sentiment on their own, they may still be used in a combination with other emojis to convey meaning.

For example, in the following tweet below, the 🤑 and the 💰 used in conjunction with other “sentimentally-charged” (and statistically significant) emojis convey positive sentiment.

Discussion:

This study aimed to demonstrate the various meanings that emojis take on within the finance community on Twitter, and it is the first of its kind to explore this topic. However, the study’s scope is limited by a relatively small dataset, with 750,000 tweets analyzed, and only 83,000 of them containing emojis. Additionally, the time periods were treated equally, which may not accurately reflect trends and changes in sentiment over time. Future research could expand on this study by exploring other sectors on Twitter, such as Food Twitter, Sports Twitter, and MedTwitter, to see how emojis are used within those contexts. Furthermore, the lack of research on this topic underscores the need for continued development of NLP emoji lexicons and models, building off the findings from this study to refine emoji sentiment analysis not just for FinTwit, but for social media more broadly. Despite these limitations, this study provides valuable insights into the unique jargon and emoji usage within the finance community on Twitter, opening up new avenues for further exploration and analysis.

Emoji sentiment on Twitter can provide valuable insight into market sentiment and help investors and financial analysts make informed decisions. This study found that how people use emojis on FinTwit is often fundamentally different from how they are used in other areas of social discourse. Trends were found in the emojis with extreme sentiment estimates, the emojis with the highest usage and the emojis with finance-specific definitions. Specifically, the the 📕 had the most negative sentiment while the 💓 had the most positive sentiment. The most frequently used emojis were 🚀, 📈, 📉, and 🔥, and it was found that the 📉 had a higher positive sentiment than the 📈. The 🔥 had a negative sentiment which is contrary to the way it is used generally. The finance community on Twitter has its own unique set of jargon that does not align with conventional emoji usage, and I identified fifteen emojis that are finance-specific. It was found that the 🦍 representing the “apes” has the most positive sentiment in the FinTwit lexicon, having a clear meaning to retail investors that the communicator is bullish on a stock. It was also found that there is a meaningful difference between 🐳 and 🐋 .

Emojis are becoming increasingly popular in financial discourse, particularly on social media platforms like Twitter. This study highlights the potential for using emoji sentiment as a tool to understand investor behavior. Moreover, this analysis sheds light on the unique language and culture of the finance community on Twitter and the need for a specialized lens with which to interpret such communication within this community. As the use of emojis continues to grow in popularity, their significance in financial analysis is likely to increase as well.

Emoji-nomics: Quantifying Emoji Sentiment in Financial Discourse on Twitter

Written by Nate Yellin