Word Cloud v.2.0: 2017 TIME 100 Most Influential People
Last week, I posted a word cloud of names of people who made it to TIME100 Most Influential People list. Today, I have a follow up.
I kept streaming twitter since April 21st. Ten days or 20K tweets later, after the Time100 Gala 2017 event where videos, such as above, proliferated on social media (or at least on my timeline), out of curiosity, I asked myself, did the names in the word cloud shift somehow? The answer is yes, it did. Here it is:
Demi Lovato bumped off Riz Ahmed.
Ryan Reynolds and Viola Davis gained more mentions this week as compared to last week.
Nothing has changed to John Legend and Ava DuVernay.
Ed Sheeran, Elizabeth Warren, Colin Kaepernick and Alessandro Michele seemingly reduced form last week.
I am also happy to share my tweet streaming code using tweepy and persisting the tweets in a NoSQL database for documents, MongoDB.
… and how I wrangled data because as we know, a tweet in its rawest form in JSON can be painstakingly noisy.
I did not fret. I got a little help from my friend json_normalize here. It made it so much easier, for a very visual person like me, to flatten and view the tweets from mangoDB document format into dataframe object -I was able to wrangled data the way I was so used to, in tabular format.
Also, to get the names without retyping all of them, I relied on their own listing. So I used BeautifulSoup to pull data out of their page on their website.
Here is my code in NBViewer: http://nbviewer.jupyter.org/github/HeyAlmer/Time100_2017/blob/master/TIME100%20Data%20Wrangling_Medium_Share.ipynb
Here is my other code for creating the word cloud, also in NBViewer: http://nbviewer.jupyter.org/github/HeyAlmer/Time100_2017/blob/master/TIME100%20word_cloud_masked_COMBINED_medium_share.ipynb