What my Twitter says about me: An adventure in data journalism
I’ve noticed lately that data visualization is really popular on Medium, and in the journalism industry in general. So I decided to do what I do best and jump on yet another social media bandwagon, because staying on-trend is the most fulfilling thing in any millennial’s menial life. And while I lack the skills to code and collect actual, meaningful data, I do possess two invaluable qualities: a sense of humor and a flair for narcissism.
I decided that my first foray into the world of data journalism should be related to my favorite topic of conversation: myself! But since I am a complex, three-dimensional, indescribable, and all-around humble human being, I found it difficult to find elements of myself that were quantifiable — able to be broken down into simple numbers, and possibly (if the technology exists), graphs and charts.
Then, it suddenly occured to me. I could conduct a quantifiable research project based upon the single most important extension of my personal brand, my sole source of self-confidence and synergistic peer validation: Twitter.
The case study
To begin my research, I requested my Twitter archive. For those who aren’t in the know, your Twitter archive is essentially an offline file which contains every tweet you have ever tweeted, organized chronologically. For the average person, this may seem like a cringe-worthy nightmare! In fact, leaving your most embarassing tweets in the past, never to be dug up again, may be best practices.
For a self-serving narcissist like me, however, requesting your Twitter archive is a dream-and-a-half. You can access all your raw, unfiltered thoughts — along with some half-baked thoughts and a handful of filtered photos — all with the click of a button! And the best part: the entire archive is completely searchable.
I decided to compile and incubate a list of popular keywords and analyze the frequency with which they appear in my tweets.
I did not bother to look up which 10 words appear most frequently in my tweets overall, as that seemed to be extremely tedious and in general, too much work. Instead, I selected an extremely skewed sample of words that I just felt like were the most frequently used (because feelings should really be what dictates data, don’t you think?) and searched for those words without having any palpable reasoning behind my choices.
If you’re worried about my selection being biased, don’t you worry! I also consulted my friend, Krista, on keyword choice, and I think it resulted in a pretty well-rounded list, one that is free of bias and a pretty inaccurate sample of my Twitter usage.
The list of selected keywords is as follows:
Every good data journalist has a barrage of graphs and charts to present their ground-breaking data. To follow in their footsteps, I have created 1 table, 2 charts, and 2 graphs, all of which contain the exact same information.
[Please read the following section in the voice of Dexter from Dexter’s Laboratory]
According to my calculations, I have come to a number of superfluous conclusions:
- Since joining Twitter in 2009, I have tweeted about “boys” 119 times. In that same timespan, I have tweeted about “bread” 122 times, thus ending my search for my one true love. I am officially pan-sexual (“pan” = the Spanish word for “bread”).
- I began tweeting about “Klaine” — the Glee couple featuring Kurt Hummel and Blaine Anderson — in 2010, just a year after I joined Twitter. Despite my frequent preoccupation with this couple (both online and in my everyday life), the data suggests that I have only explicitly tweeted about “Klaine” 74 times in the last 6 years. This seems completely inaccurate, as that number does not at all coincide with the frequency with which I think about “Klaine” (which is at least once a day).
- That being said, it is shocking to know that my tweets regarding “Hamilton” — a Broadway musical that did not come to my attention until around September of last year (2015) — has garnered 92 tweets in the last 8 months. An even more well-rounded data collection would have included variations on the title “Hamilton” (i.e. “Ham”). If anything, however, this data collection is everything but well-rounded.
- I would also like to note that I apparently have only mentioned “Hamilton” 2 more times than I have mentioned “sportz” (90), which is both disturbing and impressive.
- The results regarding the keyword “sad” may be rendered utterly useless, because many of the instances of the letter combination s-a-d — not necessarily being used with the intended meaning of “sad” or “sadness” — are within the phrase “carne asada,” usually referring to a burrito or plate of specialty Mexican fries. Therefore, the 336 impressions of the word “sad” are completely inaccurate and hopelessly skewed.
- It can be assumed that my top 4 search keywords — “sad,” “god,” “Glee,” and “Krista” — are often used in collaboration with one another, as I spent many tweets being “sad” over “Glee” and blaming “god” for this pain, while also @ replying “Krista,” who was also “sad” over “Glee.”
- I have apparently taken “god”’s name in vain on the Internet 598 times, which explains why I belong in the 8th circle of hell, according to this quiz. (Also, my repeated spelling of “god” with a lowercase “G” may also be a factor in my damnation.)
Conclusion and synthesis
After carefully assessing and analyzing my findings, I conclude that collecting data is hard and I am not very good at it. It is, however, very appealing to employers right now, so hopefully this one subpar article will be enough for me to get endorsed for “data visualization” on LinkedIn.
Additionally, I have found that as a journalist, there is an increasing emphasis on the need to stay “on trend” and “relevant,” and a push to gain “more clicks” instead of creating “meaningful content” that audiences would benefit from reading. This project, of course, is not a desperate attempt to leverage my skills in a shrinking yet highly competitive field, but instead to make you think, “Wow, this girl is an idiot. I can’t believe she spent all that time collecting fake data to try and make a moot point about an issue that has been plaguing the industry for ages.”
Well, believe it. Because numbers don’t lie — even if my resumé does.