The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K…

#Masks Throughout COVID-19

A Twitter Sentiment Analysis with Eric Blander

Joshua Szymanowski
The Startup
Published in
11 min readJul 18, 2020

--

Thinking back on the early days of the pandemic, and where we are today, I still struggle to process the the stark shift in the narrative around whether or not the public should wear masks. I remember it being strange then, and it seems even stranger now. Take, for example, the following two tweets from the Surgeon General:

February 29, 2020

Seriously people- STOP BUYING MASKS! They are NOT effective in preventing general public from catching #Coronavirus, but if healthcare providers can’t get them to care for sick patients, it puts them and our communities at risk!

May 23, 2020

As our communities reopen, wearing a face covering is something we all can — and should — do to protect each other. So no matter how you’re spending #MemorialDayWeekend, wear a face covering when out in public. Comment w/ a selfie & encourage others to do so too

In a major reversal that can only be described as too-little-too-late, the Surgeon General — in the span of a few crucial months — finally implores the country to wear masks after previously shouting about their ineffectiveness.

With this in mind, we wanted to investigate the messaging around masks and the overall sentiment of tweets about masks on Twitter. In the above tweets, the sentiment scores (which can range from -1 to 1) between these tweets tell a big story. The first tweet has a sentiment score of -0.7536, whereas the second tweet has a score of 0.7108. In other words, the first tweet is overwhelmingly negative in tone and the second is overwhelmingly positive.

So we began to wonder: was there a similar shift in sentiment on all of Twitter?

The data

We scraped tweets from January through May that contained the word mask or masks, and any one of the following:

covid|dead|death|doctor|infect|novel|nurse|outbreak|rona|sars|viral|virus|wuhan

We eventually pared that down to 5,000 tweets per day because our dataset was getting a bit too large to manage. We ended up with about 580,000 total tweets, along with some Twitter statistics including number of likes, replies, and retweets.

Using VADER, we found the sentiment scores of the raw tweets. Using these scores, we split tweets into one of three categories: negative, neutral, and positive. These categories became the target variable for our prediction model. There was a bit of a class imbalance — about 40% of the tweets were positive, 40% were negative, and only around 20% were neutral.

Outline

After cleaning our data and adding the target variable, we processed the text and engineered some features. Then we explored the data and created visualizations. Finally, we ran a few prediction models to provide us with more insights and confirm or deny what we had found during EDA.

Text processing and feature engineering

To process the text, we:

  • made the tweets lowercase
  • removed mentions (i.e. @ + [insert_username])
  • removed links
  • converted emoticons to words using Sanket Doshi’s dictionary
  • converted emojis to words using Emoji for Python
  • removed punctuation (except apostrophes)
  • tokenized words, keeping contractions intact so we could convert those to full words (thanks again, Sanket Doshi!)
  • removed stop words

Our stop words included NLTK’s base list, plus our search terms described above (and their hashtag versions) and words we found to be too close to mask or too common without providing much interpretive value.

After processing, we used TF-IDF vectorization. Computational power was a definite hurdle throughout this project, and in order to decrease training time, we only included words with at least 250 appearances in our corpus.

From our TF-IDF vectors, we created a Gensim corpus and mapping dictionary, which we used in an LDA model to generate 10 topics. Below is a list of our summaries for these topics:

LDA Topic List

0. Protests, without masks
1. Mask ads
2. Biden, social distancing
3. Social distancing, protection
4. Fauci, regulations
5. Lockdown, testing
6. News, statistics
7. Trump, protests
8. News, protests, essential services, ads
9. Social distancing, death, Trump

As you can see, there are several topics where protesting makes an appearance. Unfortunately, it can be difficult to interpret whether or not they are referring to the anti-social-distancing protests or Black Lives Matter protests. Given the fact that these tweets were only through the end of May, which is around the time recent BLM protests started en masse, the large majority most likely relate to the former.

Some other important subjects include key figures (Trump, Fauci, Biden) and social distancing regulations.

The last feature we created was the subjectivity score via TextBlob, i.e. whether or not a tweet was based more in fact or opinion. There appeared to be a fairly normal distribution of subjectivity, though it was skewed by a large amount of tweets with a score of 0.0 (meaning they were entirely fact-based), which can most likely be ascribed to tweets from news sources.

EDA

Sentiment over time

We found that the sentiment of tweets stayed relatively the same throughout the five months, with positive and negative sentiments staying around 40% each, and neutral tweets being anywhere between 10 and 20%.

A slight edge for positive tweets, but sentiment appears relatively constant overall.
  • Tweets do start off as overwhelming negative in the first half of January, though it should be noted that in this time period we were not getting anywhere near 5000 tweets that related to masks and coronavirus, so this should be taken with a rather large grain of salt. (We didn’t consistently achieve 5000 tweets per day until about mid-March).
  • Mid-February is another point of interest, with a large spike in negative tweets, perhaps corresponding to the first rumblings of COVID in the US.
  • Negativity appears to gradually decrease until the end of April, which corresponds with a spike in Topic 7 (Trump and the social distancing protests), and negativity gradually increases until the end of May.
  • Positivity has its strongest moments in mid- to late-March, but stays fairly constant throughout.
  • Overall, we don’t see any overwhelming shift in sentiment, merely a fairly consistently polarized Twitterverse.

Looking at this on the scale of months gave us another look at the relative consistency of tweet sentiment over time.

  • Positive tweets gain a slight edge during March and April, but are generally fairly even with negative tweets.
  • April stands out for having noticeably more neutral tweets, perhaps a result of wider news coverage.

Top words

After cleaning and removing stop words that were either too common or too related to our search terms, we had about 8 million words and a unique vocabulary of about 240,000. We looked at word frequencies and found these to be our top 25 words (note: this is also after lemmatizing).

There are a lot of medical and viral terms, as well as some interesting though unsurprising ones like China and Trump.

Importantly, most of these words can be used in positive and negative ways. You need to wear a mask or you don’t need to wear a mask, masks help/protect, masks don’t help/protect, etc.

Looking at the top words within each sentiment, we see some stark differences. Negative tweets tend to focus on infection, death, China, and Trump; whereas positive tweets tend to talk about the safety and protection masks provide, and medical terms seem a bit larger overall here.

Word clouds for each sentiment class.

Neutral tweets are most likely related to more “newsy” tweets, discussing numbers (million), the public, quotes, and next steps (imperative words like need, say, and make).

Positive tweets also seem to have a larger use of emojis (facewithtearsofjoy, facewithmedicalmask, rollingonthefloorlaughing). In the future, it may be worth removing emojis and comparing results, to find out if emojis skew sentiment scores toward the more positive or if emojis are indeed associated with more positive tweets.

LDA topic modeling

We built ten topics over 100 passes through our corpus (topic summaries can be seen above). There were some interesting results when we looked at the distribution of tweet sentiment across these topics.

  • Topic 1 (mask advertisements), topic 6 (news, statistics), and topic 8 (news, protests, essential services, ads) contain a large amount of neutral tweets, which seems to make sense.
  • Topic 8 oddly contains hoax though this may be in relation to news stories reporting some people’s feelings toward COVID.
  • Topic 9 (social distancing, death, Trump) has a lot of negative tweets.
  • Topic 3 (social distancing, protection) has the most positive tweets, which also seems to make sense, given words like protect, help, others, etc. It appears to be about the common good and using masks to help keep others safe.
  • Topic 7 (Trump, protests) interestingly has about equal numbers for all sentiment categories, and it does indeed seem to be the most polarizing of topics.
  • Topics 0 (protests, without masks), topic 2 (Biden, social distancing), and topic 9 (social distancing, death, Trump) are relatively non-neutral.

The following graphs give a sense of how popular each topic was, which can also help in summarizing each topic.

These also provide a little more sense of how similarly sentiment is distributed across topics, with the sizes of these bars more or less the same across the three categories.

Some notable exceptions are:

  • Topic 5 (lockdown, testing) has a large relative share of neutral tweets.
  • Topic 3 (social distancing, protection) has a large relative share of positive tweets.
  • Topic 9 (social distancing, death, Trump) has a large relative share of negative tweets.

Modeling

We ran Naïve Bayes and Decision Tree models and got decent accuracy/F1 scores given that we had three classes. Even our basic Decision Tree took over a half hour to run, so it was difficult to implement anything beyond that.

We also found that our models performed better (much better in the case of the Decision Tree) using only the word vectors, as opposed to word vectors plus our other engineered features and numerical data.

In each of our three most important models there was a fairly even split in true positives across the three classes. Our Bernoulli Naive Bayes model ran in about four minutes and achieved an F1 score of 71.5%. Our Decision Tree using only word vectors took an hour and fifteen minutes to run (!) and got an F1 score of 74.6%. And our Decision Tree using all of our features took 30 minutes to run, but only got an accuracy score of 67.1%.

Here’s a breakdown of each model’s confusion matrices.

Model progression through confusion matrices

It’s very tempting to call it a day with a fairly accurate 2-minute model, eh? Depending on the use-case, however, the Decision Tree model may be worth the extra time.

But wait! It’s pruning time!

As is the case with most Decision Trees, our models were incredibly overfit. To combat this, we used a handy hyperparameter known as ccp_alpha, which allowed us to trim down the complexity of the decision tree considerably and produce a model that can hopefully generalize much better to unseen data.

After running sklearn’s cost_complexity_pruning_path, we determined a good value for ccp_alpha to be 0.0005, which gave us an almost perfect fit and actually increased our F1 score to 76.7%. Take a look at the new and improved confusion matrix below:

It still takes over an hour to run, but now we have a much stronger argument for choosing it over the Naive Bayes model.

Top features

Our top ten features for our most accurate model were an interesting mix of words that were frequently used in positive and negative tweets.

Protect, help, hand, and please were in the top 5 most frequently used words in the positive class.

Infected, death, and stop were in the top 5 in the negative class.

Fight was the only word here that didn’t appear in any top 25 list.

While more difficult to interpret, the top features for our best Naive Bayes model were fairly interesting:

The top features for positive tweets were all hashtags and emojis. The top features for neutral tweets also included a lot of emojis, as well as some strangely non-neutral words (in my mind, at least). And the top features for negative tweets were indeed negative words (other than declares, germany’s, and possibly iniran). None of these words were in the top 25 overall.

Conclusions

The power of words! Words themselves proved to be the most effective predictors of sentiment. While our other features were important during EDA, they did not prove effective in our models.

Tweets were generally more negative in January but relatively constant from February through May. Negativity hit a low point in March and April, but this was not accompanied by an opposite trend in positive tweets. Instead, neutral tweets increased in this time. Unlike in the words of the Surgeon General, we did not observe any stark shifts in attitude in tweets about masks and coronavirus.

(source)

Future considerations

Since words were so important in our models, playing around with stop words and text processing could potentially yield better results. Given the relative effectiveness and quickness of our Bernoulli Naive Bayes model, it may be worth taking all the tweets we scraped (rather than capping it at 5000 per day) and building a model around that.

We could also try to get better accuracy via a deep learning model, possibly LSTM, although the amount of time that model would take (not to mention all the tweaking) seems rather prohibitive. Using spaCy, Word2Vec, and/or Doc2Vec to build vectors may also prove to be more fruitful.

And finally, we would like to try to further investigate sentiment toward the word mask or masks itself, as opposed to the tweet as a whole.

Project repo

You can check out our project repo on GitHub:
https://github.com/p-szymo/twitter-sentiment-analysis

Top words overall

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

No responses yet