A Bird’s-Eye View of #WomensMarch
On January 21st, 2017, a historic number of people showed up for the Women’s March on Washington, a protest march rejecting the policies of the newly inaugurated President of the United States. In tandem with the march was a flurry of tweets magnifying the on-the-ground movement through the use of #WomensMarch. What were the most popular #WomensMarch tweets? How many people were tweeting? When did the tweeting spike? What other topics were people discussing? To answer these questions, I analyzed over 850,000 tweets from January 21st discussing the Women’s March on Washington.
The IDs of these tweets, along with the code used to generate the results and figures of this post, are availble on GitHub. A version of this post has been cross-posted at the Computational Story Lab’s blog.
Tweeting the Women’s Marches
Crowd scientists and other statistical professionals estimate that nearly 470,000 marched on Washington, and that about 3.2 million people in total marched at over 300 sites across the United States. An equally voluminous number of people tweeted about the marches on the day following the presidential inauguration.
Chris Fusting (a fellow conspirator in applied mathematics) and I searched for tweets containing mentions of “Women’s March” or #WomensMarch. From Twitter’s Gardenhose, a 10% sample of Twitter’s entire stream, we extracted 854,811 tweets. Access to the Gardenhose is generously provided to us by the Vermont Complex Systems Center and Computational Story Lab.
The tweets were posted by 527,988 unique users, resulting in an average of approximately 1.6 tweets per user. About 86% of those tweets were retweets, which is standard in most samples of Twitter. Scaling appropriately (since our sample is 10% of the whole), there were an estimated 8.5 million tweets from 7.3 million unique users. This is likely an underestimate of the true number of tweets and users on January 21st, due to the fact that most replies to tweets in our data set are probably not in our data set (since many people may not explicitly mention “Women’s March” in a reply to a tweet about the Women’s March). Note, there is no easy way to use the Twitter API to collect the replies to the tweets.
We see that the number of tweets about the Women’s March steadily grew throughout the day, peaking in the mid-afternoon. In the roughly 2 hour peak of tweeting, almost 2 million tweets were generated, about 22% of the tweets from the entire day. While the number of tweets tapered off following the official ends of the marches, people were still writing about 83 tweets per second around midnight.
With an estimated 120,000 retweets, Randi Mayem Singer had the most popular tweet on the day of the marches.
She was followed by Parks and Rec icon Ron Swanson, a.ka. Nick Offerman, who had a tweet that picked up about 110,000 retweets.
The third most popular retweet was from Hillary Clinton, whose tweet depicted three of the major organizers of the Women’s March on Washington.
These tweets came from around the United States, as other cities held their own Women’s March. Users posted from Manhattan, Los Angeles, Chicago, and even London. In Canada, Brazil, France and other countries, there was international support for the Women’s March on Washington.
Networks of Conversation around #WomensMarch
There are many advanced natural language processing and machine learning tools we could apply to get a sense of the texture of the #WomensMarch conversations. Here, we simply look at the most popular hashtags, since they act as good indicators of the broader topics.
The hashtags reinforce the solidarity of other cities with the Women’s March on Washington. As with the tweet locations, we see references to marches in Los Angeles, New York City, Denver, and Chicago. Somewhat surprisingly, #WomensMarchLondon was the second most popular hashtag after #WomensMarchOnWashington. Other hashtags appear, such as #WhyIMarch, #NastyWoman, and #SoBadEvenIntrovertsAreHere.
We can get a better sense of the larger web of conversation by examining a network of hashtags. We create a network similar to your social network on Facebook or Twitter, except here the entities are hashtags, not people, and we connect two hashtags if they appear together in the same tweet.
Creating the network in this manner, we are left with a network of 14,500 hashtags with 58,000 connections between them. It is convenient to filter this network down to its most important aspects, so we apply the disparity filter, a fancy technique for extracting the backbone of a weighted network (details left to the curious). Applying the filter, we are left with 407 of the most important hashtags and 776 links between them. This network is pictured below.
Not surprisingly, #WomensMarchOnWashington is the most popular hashtag in the core of the network. However, this view of the hashtags gives us a different perspective on the topics of conversation surrounding #WomensMarch. For instance, in the bottom left we see a cluster of anti-POTUS hashtags (#Trump, #TheResistance, #Resist, #StrongerTogether), which were not immediately apparent when just viewing the most popular hashtags. We also see that #SoBadEvenIntrovertsAreHere is heavily connected to #WomensMarchNYC, suggesting it originated from users in New York.
The hashtag topic network gives us alternative ways to rank the importance of these hashtags. Rather than ranking just by how popular a hashtag was, we can rank by how well a hashtag does at connecting to other hashtags. That is, we can look at its weighted degree: the number of hashtags it appeared with, weighted by the number of times it appeared with each hashtag.
We see similar hashtags as to when we ranked just by popularity, but the magnitudes are quite different now. Here #WhyIMarch is nearly as “important” as #WomensMarchOnWashington in terms of how well-connected it is, while #WomensMarchLondon has fallen significantly in ranking. Meanwhile, new hashtags have entered the rankings, such as #HearOurVoice and #MAGA, indicating that these hashtags may not have been particularly popular, but they are more central to maintaining the structure of the topic network.
This touches on a much broader concept of network centrality, and there are far more nuanced ways of measuring the most topologically key parts of a network. Here, we leave a more detailed analysis of centrality as an exercise to the interested reader.
Through analysis of more than 850,000 tweets, we have gained a bird’s eye view of the online conversation surrounding the Women’s March on Washington. We have formulated this view largely from simple count data and the construction of a hashtag topic network. Through the hashtag topic network, we see there is a complex web of solidarity, resistance, and support underlying the #WomensMarch conversations.
What more could we look at?
- We’ve gained some sense of how many people were tweeting, but who were those people? FiveThirtyEight’s Nate Silver discussed how physical turnout for the marches was driven by Clinton supporters. Is it the same story for those not at the marches but still tweeting about them?
- Hashtags are nice indicators of topics, but there is an abundance of language beyond the hashtags. Further textual analysis could apply topic modeling to understand the topics from a broader angle, or sentiment analysis to understand the online emotional dynamics during the day.
- Given the outcome of the election, there are certainly people who did not agree with the Women’s March. What were those people saying? How did they interact with #WomensMarch tweeters? Did any counter-protest coalition form specifically in response to the Women’s March?
If you would like your hand at answering any of these questions, the IDs of the tweets underlying this post are available on GitHub.
Thanks to Chris Fusting for support in collecting the tweets, and shout out to Chris Danforth for encouraging me to write this post in the midst of a massive amount of complex analysis homework.
Ryan J. Gallagher is a graduate student at the University of Vermont studying mathematics and complex systems. He works at the intersection of natural language processing and network science to study the dynamics of sociotechnical systems. When not traversing the horrifying depths of mathematics, he aspires to be a one-person band.