Accessing my 10 years on Twitter

Published in

Design, Strategy, Data & People

6 min readFeb 10, 2017

22nd January 2017 marked my 10th anniversary on Twitter. My relationship with Twitter has been a little on and off over the years, but for the last 3 or 4 I’ve been pretty active (some may go as far as say addicted). As I’m a little geekish and like playing with data I decided that this milestone was a good reason to download my Twitter archive and take a rummage through the 10K+ tweets. Unfortunately the archive doesn’t come with any analytics so you don’t get data on how many times your tweet was RTed, liked, viewed etc. You need to have set up Twitter Analytics for that and pull the data separately (for me that meant I only had data from January 2014, which I had to manually pull in 90 day blocks — lesson learnt; download and archive monthly going forward).

Working with the data

Twitter allows you to download your entire archive, which is great because if you do things via the API you get restricted to batches of tweets. Once you’ve got it (here are some instructions), you can explore it straight away via the html bundled files, or you can suck it into something else to do some custom analysis, which is what I wanted to do. There are two options here; you can just grab the CSV file (tweets.csv), or you can use the JSON files ([twitter archive]/data/js/tweets/*.js). The JSON files have lots more information and useful things such as RT nesting, user IDs and links to media. They are organised as separate Year/Month files and not immediately easy to work with, due to some odd formatting. However I found a great little terminal script that cleans them up so that Python can read them. It’s not essential to mess with the JSON, the CSV has lots of info if all you need is a link back to the tweet and the tweet’s textual content. Here are the fields in the CSV:

tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, expanded_urls

Here’s an example of a tweet from the JSON, as you can see lot’s more info:

I ended up using both the CSV and the JSON. As my tool of choice for the analysis was (as always) Qlik Sense it did mean I needed to convert the the JSON data I wanted to flat tables (CSVs) before I could work with it, (currently there is no native static file JSON reader in Qlik Sense, only via REST). The Twitter analytics data was easy to add to it as already in CSV format and all linked up and associated by the tweet id. I did a little wrangling to pull out the hashtags, people I’d mentioned, people I’d replied to, people I’d retweeted, external links, images I’d posted. Once I had that loaded in to Qlik Sense I started exploring it. The nice thing is it enables me to search my tweets and view the links and images I posted.

What I found

First off it was very apparent that I’d taken to Twitter much more in the last 3 years. I also discovered that Twitter made a fundamental change in how the ID’s for users and tweets worked. Up until 2010 the tweet ID and user ID (the numeric ID behind your twitter handle) were incremental, but it’s been replaced with a much larger number (looks like a 64bit hash). However the original numbering does remain for those earlier members. I’m at number 680,683 but one of the people I’ve interacted with Ario has the ID of 294 — now that’s an early adopter (also he’s still active).

Here are some of the numbers:

Tweets made: 10,883
— of which were retweets: 6,512
Total number of characters tweeted: 1,156,062
— approx number of words: 152,129
Number of images I posted: 895
Different hashtags I used: 653 (exc. retweets)
Different people interacted with: 3,915

How things have changed

When it comes to activity it’s clear that I wasn’t very committed before 2014. In fact, early on I only used it for automation and following, rarely engaging or sharing content. But since 2014 there’s be a steady increase in use, with my average tweets per day rising from 6.9 to 9.5, with 2016/17 seeing Friday leap ahead as my favourite day to tweet on. Hashtag usage has increased significantly too, in fact in the first 7 years there are only 404 hashtag occurrences where as there are 3,184 in the last 3 years. Unsurprisingly considering what I do, #dataviz tops my personal all time list, with #graphicdesign a close second (although that’s been mainly due the last year and a resurgence in interest in my graphic design past and blog: http://www.paperposts.me/).

Engagement and interactions

When it comes to the Twitter Analytics part (how others have engaged with my tweets) I only have data for the last 3 years. But as that makes ups over 80% of my tweeting, it’s a good enough.

Straight off the bat it’s obvious that due to my pitiful number of followers, I need to leverage other users to drive engagement (this is not a strategy, just an observation). My most successful tweets have included mentions and images, and have then been retweeted by the individual mentioned (who has lots of followers). Here’s the overview:

Images are very important to engagement, most of my top tweets have contained them:

But it’s the people you interact with that makes Twitter fun, however it does appear I interacted with myself quite a lot (cue the armchair psychologist):

The 64 million dollar question — is this all worth it

No, of course it isn’t. It’s distraction at best. I do have a few genuine engagements and small niche sets of people that I share things with and have a continuous interaction with, but mostly it’s just tilting at windmills. For other’s, those more committed to their “social media strategy”, I’m sure it makes sense, but for me it’s a folly. So much so that I’ve purged much of the online archive and I’m considering taking a leaf out of a friend’s book and deleting my account. We will see.

Tools and stuff used for this:

Twitter archive — how to get it
Script to fix the JSON — github
Qlik Sense — download
Qlik Sense Extensions:
MGO Image Grid v3 — get it from Qlik Branch
HTML box — https://github.com/seebach/it.HTMLBox
Single dimension animation — get it from Qlik Branch

Accessing my 10 years on Twitter

Working with the data

What I found

The 64 million dollar question — is this all worth it

Written by murraygm