Does The NSA Know When I Sleep?
An Analysis of My Gmail Metadata
News of the NSA’s programs motivates us to think harder about the information revealed by our digital activities. So what might the NSA see in my data? Better yet, what can I see?
Inspired by Stephen Wolfram’s post on personal analytics, I am trying to track and visualize more of my data. In May, I looked at my digital footprint on Twitter, Facebook, as well as my iPhone geolocation data from OpenPaths.The data I really wanted though was my email.
Immersion, an MIT Media Lab project, visualizes your Gmail as a network of people rather than a chronological sequence of text.
To build this visualization, Immersion extracts all of your email headers. The team also makes your data available to download (thank you!).
Selecting only the emails I sent, and merging the data set with tweets I posted, I created the follow graphic. The plot is dense and reveals a lot about my habits and patterns over time.
In the fall of ‘10 you can see that I slept from 14:00 to 21:00. I was in China, 12 hours ahead of NYC, so that was actually 02:00 to 09:00. You can see the earlier nights in both the ‘11 and ‘12 summers when I was interning. Off cycle email times also show when I left the country in May ‘ 12, January ‘13, April ‘13, and May’13.
Drilling in further you can see changes in habits. You can see how my nights got later at the end of the semester with finals. You can also see the drop in traffic during the middle of March while on spring break.
Looking at Junior year, you can see that from October ‘11 to April ‘12 I was regularly awake well after 3am. I didn’t want to admit it at the time, but reflecting on the graphic I can acknowledge that the poor sleep pattern was detrimental to my academic performance. I wonder what correlations the NSA could give us between sleep and performance?
You can see the buildup and drop off in email traffic at the beginning of 2012 as I was planning TEDxYale.In ‘12, like ‘13, you can also see a spring break drop off in the middle of March.
The metadata is also useful for personal productivity. Looking at the distribution of my sent emails, it’s striking how email pervades almost every hour of the day.
When do I have class? Sadly, it’s hard to tell, for I am not as disciplined as I should be and don’t ignore email. Activity does spike on Monday and Wednesday around midday, which was an hour break I had between 10am-5pm class. Interestingly, the patterns have continued, and if you want to get a response from me, midday on Wednesday is still my peak email.
I could probably increase productivity by setting a regular email cadence, and only responding to emails at the beginning, middle and end of the day.
While I didn’t cover it here, looking at my Facebook data reveals that Sunday and Monday evening are my least productive times of the week (as measured by an increase in using Facebook).
Plotting my email activity by week over time, a fascinating pattern of peaks and troughs emerges. It appears that I will send lots of emails one week, and a lot less for the following 3-6 weeks, before surging again. This is the type of pattern I would like to see in my daily emails, but it is probably not good week to week.
As we live more of our lives on the web, we develop a rich personal history hidden in our metadata. This data is valuable for us, the individuals creating it, not just the NSA. Visualizing it at scale enables us to self-reflect and fully internalize our habits.We need more applications like Immersion that help us see these patterns.
I encourage you to try the above visualizations on your own data. Below are the steps I took to generate the graphs:
- Download your Gmail header data from https://immersion.media.mit.edu/getemails/n. Increment n from 0 by 1 until you get a 404. Save each page as n.json. I had 5 pages and I think there are roughly 10k emails per page.
- Use this Ruby script to combine the set of .json files into a single CSV.
- Use this R script to clean up the time stamps and plot the graphs.