Image for post
Image for post
https://www.womensmarch.com/graphics

Exploring #WomensMarch

Nick Ruest
Feb 10, 2017 · 5 min read

A couple Saturday mornings ago, I was on the couch listening to records and reading a book when Christina Harlow and MJ Suhonos asked me about collecting #WomensMarch tweets. Little did I know at the time #WomensMarch would be the largest volume collection I have ever seen. By the time I stopped collecting a week later, we’d amassed 14,478,518 unique tweet ids from 3,582,495 unique users, and at one point hit around 1 million tweets in a single hour.

Image for post
Image for post
(Generated with Peter Binkley’s twarc-report)

This put #WomensMarch well over 1% of the overall Twitter stream, which causes dropped tweets if you’re collecting from the Filter API, so I used the strategy of using the both the Filter and Search APIs for collection. (If you’re curious about learning more about this, check out Kevin Driscoll and Shawn Walker’s “Big Data, Big Questions | Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data”, and Jiaul H. Paik and Jimmy Lin’s “Do Multiple Listeners to the Public Twitter Sample Stream Receive the Same Tweets?). I’ve included the search and filter logs in the dataset. If you grep "WARNING" WomensMarch_filter.log or grep "WARNING" WomensMarch_filter.log | wc -l you'll get a sense of the scale of dropped tweets. For a number of hours on January 22, I was seeing around 1.6 million cumulative dropped tweets!

Image for post
Image for post
Output of `tail -f WomensMarch_filter.log | grep WARN`

I collected from around 11AM EST on January 21, 2017 to 11AM EST January 28, 2017 with the Filter API, and did two Search API queries. Final count before deduplication looked like this:

$ wc -l WomensMarch_filter.json WomensMarch_search_01.json WomensMarch_search_02.json 

Final stats: 14,478,518 tweets in a 104GB json file!

This puts us in the same range as what Ryan Gallagher projected in “A Bird’s-Eye View of #WomensMarch.”

Below I’ll give a quick overview of the dataset using utilities from Documenting the Now’s twarc, and utilities described inline. This is the same approach taken in Ian Milligan and my 2016 Code4Lib Journal article, “An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter.”

This is probably all that I’ll have time to do with the dataset. Please feel free to use it in your own research. It’s licensed CC-BY, so please have at it!

If you want access to other Twitter datasets to analyse, check out http://www.docnow.io/catalog/.

Users

Tweets Username
5,375 paparcura
4,703 latinagirlpwr
1,903 ImJacobLadder
1,236 unbreakablepenn
1,212 amForever44
1,178 BassthebeastNYC
1,170 womensmarch
1,017 WhyIMarch
982 TheLifeVote
952 zerocomados

3,582,495 unique users.

Retweets

146,370 Retweets

141,111 Retweets

109,865 Retweets

84,161 Retweets

70,600 Retweets

62,591 Retweets

59,366 Retweets

56,365 Retweets

52,125 Retweets

50,944 Retweets

Clients

Tweets Clients
7,098,145
Twitter for iPhone
3,718,467 Twitter for Android
2,066,773 Twitter for iPad
634,054 Twitter Web Client
306,225 Mobile Web (M5)
127,622 TweetDeck
59,463 Instagram
54,851 Tweetbot for iOS
47,556 Twitter for Windows
36,404 IFTTT

URLs

Tweets URL

29,223 https://www.facebook.com/cnn/videos/10155945796281509/

27,435 http://www.cnn.com/2017/01/21/politics/womens-march-donald-trump-inauguration-sizes/index.html?sr=twCNN012117womens-march-donald-trump-inauguration-sizes0205PMStoryGal

24,854 http://www.independent.co.uk/news/world/americas/womens-march-antarctica-donald-trump-inauguration-women-hate-donald-trump-so-much-they-are-even-a7538856.html

21,189 https://twitter.com/kayleighmcenany/status/822979246205403136

20,902 https://twitter.com/mcgregor_ewan/status/823805815488331776

14,857 http://www.cnn.com/2017/01/21/politics/womens-march-donald-trump-inauguration-sizes/index.html?sr=twpol012117womens-march-donald-trump-inauguration-sizes0832PMVODtopLink&linkId=33643748

12,630 https://www.womensmarch.com/sisters

11,244 https://twitter.com/tomilahren/status/822852245532319744

9,761 https://twitter.com/mstharrington/status/823190136200593408

9,585 http://www.cnn.com/2017/01/21/politics/womens-march-protests-live-coverage/index.html?sr=twCNN012117womens-march-protests-live-coverage1208PMVODtop

2,403,637 URLs tweeted, with 527,350 of those being unique urls.

I’ve also setup a little bash script to feed all the unique urls to Internet Archive:

#!/bin/bash

And, I’ve also set up a crawl with Heritrix, and I’ll make that data available here once it is complete.

Domains

Tweets Domain
1,219,747 twitter.com
159,087 instagram.com
134,309 cnn.com
68,479 facebook.com
50,561 womensmarch.com
43,219 youtube.com
36,946 nytimes.com
30,201 huffingtonpost.com
21,520 paper.li
21,476 cbsnews.com

Embedded Images

Tweets 146,442

Image for post
Image for post
http://pbs.twimg.com/media/C2tb_gnVQAA8HgI.jpg

Tweets 81,139

Image for post
Image for post
http://pbs.twimg.com/media/C2tWPrKXEAApDaO.jpg

Tweets 71,877

Image for post
Image for post
http://pbs.twimg.com/ext_tw_video_thumb/822861821958885377/pu/img/ikDaBv7bTSiqX0_z.jpg

Tweets 64,149

Image for post
Image for post
http://pbs.twimg.com/media/C2uRgaeW8AAsR1g.jpg

Tweets 59,214

Image for post
Image for post
http://pbs.twimg.com/ext_tw_video_thumb/822834122322112512/pu/img/YpJMCEa3NOxYT7qZ.jpg

Tweets 58,599

Image for post
Image for post
http://pbs.twimg.com/tweet_video_thumb/C2taTx6VIAAu7Md.jpg

Tweets 51,439

Image for post
Image for post
http://pbs.twimg.com/media/C2tK4Y5XcAAcQ23.jpg

Tweets 44,611

Image for post
Image for post
http://pbs.twimg.com/media/C2t7hgBUAAI2ZzI.jpg

Tweets 43,845

Image for post
Image for post
http://pbs.twimg.com/media/C2tugXLXgAArJO4.jpg

Tweets 41,436

Image for post
Image for post
http://pbs.twimg.com/ext_tw_video_thumb/822822166295027712/pu/img/Ig0mf5AKTF2nJD-M.jpg

6,153,894 embedded image URLs tweeted, with 390,298 of those being unique urls.

I’ll be creating an image montage similar to what I did for #elxn42 and #panamapapers for #WomensMarch. It’ll take some time, and I have to gather resources to make it happen since we’re looking at about 5 times the amount of images for #WomensMarch.

On Archivy

Occasional writings about the archive

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store