Why do they get all my data but I don’t?

In May 2018, the GDPR will come into full effect. Any EU citizen can demand their personal data from a company/organisation so that it can be used somewhere else.

Turns out, you can request your data from some large service providers already today, but the way you receive the data and how soon you do varies vastly, and the time spent before you get it does too.

I requested my Facebook data from old accounts that I had barely ever used. Under the General Account Settings they have the option to “Download a copy of your Facebook data”, and clicking on that starts the process of preparing my data for download. I was provided a zip-file to download a day or two later. So what is in that data dump? I found that every photo I had posted was in a folder marked “photos”, every private message had their own HTML page in a folder marked messages and every wall posting ever made was available via “timeline.htm.” It was all served to me as static HTML pages.

All of my data was there, but I couldn’t do much with it, other than navigate the pages as if I was navigating an offline version of Facebook. It is all of my data, but I can’t easily bring it elsewhere. I’ve made 700 postings in the WEEN appreciation group, but WEEN will never know.

If you want to do something more than click around on your static Facebook pages, Kyle Mathews made Facebook Export Parser which at least allows you to analyze and inspect your data further.

Next I went to get my data from Twitter. In your Twitter settings you may “request your archive”, and soon you’ll receive an email that your Tweet Archive is ready to download. You’ll get something named a random string of numbers .zip, and it can be several megabytes large if you’re a prolific twitterer. You’ll find that it contains a nice archive of your tweets with an HTML interface, and includes a search functionality. All tweets are embedded in javascript files (as JSON) grouped by year and month.

Grailbird object, is a Twitter internal tool for converting tweet archives into textfiles, and Grailbird Updater is a neat tool for using the Twitter API to complete the data by doing the additional API queries.

“Turns out the contents in the archive are partial/trimmed API responses from the Twitter API, so it is actually possible to drop a whole API response in there, do some sorting and update the archive.”

Well, that was nice of Github user DeMarko to make that available. I’m noting that Twitter didn’t. Hmm.

Soundcloud does not allow me to access my data at all, except from the obvious public data that anybody is allowed to access. Google on the other hand has a “takeout” menus where you can choose what data you download, and how.

Spotify, who have made entire ad campaigns based on individuals data insights are digging through literally everything that everyone does, but does not make your data accessible to you.

“We do have ridiculous amounts of data. The geeky way of describing it is: What are the use cases? What’s going on in people’s lives that they are amplifying with music? That creates a treasure trove, but it’s very difficult to sift through. When we first started doing this, we came into it with an open mind. We have a group of people here that we’ve hired — analysts within the marketing group. Our creative team has complete access to all of this data.”

Why, it’s as if the music service is spying on you, whilst not allowing you to access this data. They do, however, open up all this data to advertisers. What if you are the guy who listened o “humble” 1,251 times and you wanted to prove to Kendrick Lamar that you’re such a super-fan they made billboards out of your listening habit? You can’t.

This needs to change, and with the GDPR this is going to change.