Coursera, third party tracking and transatlantic subject access requests

Paul-Olivier Dehaye
PersonalData.IO
Published in
5 min readAug 8, 2016

As part of the Digital Pedagogy Lab Summer institute (action track?) There was today a conversation on Twitter concerning the tech embedded in academic work. I think I can add some documentation and information to that conversation, with a global perspective (if you are only interested in the associated legal documents, skip the lengthy explanation all the way to the bottom).

Missing from this whiteboard is the whole analytics business, shadowed closely by the whole advertising mafia. For instance, a quick look at browser cookies used by the MOOC platform Coursera.com reveals that they rely on Google Analytics, Facebook Connect, Facebook Custom Audience Management, Google Adwords Conversion, Google Dynamic Remarketing, and Google Tag Manager cookies. You can see the output of the Ghostery extension for the Coursera landing page in the header image above.

In each case, Coursera is placing a cookie on a student’s computer in order to do some analytics and/or personalisation, using third party services. Usually Coursera gets to use those services for free, and the service provider gets to reuse this personal data for their own purposes (if your webmaster is not paying for a service, then his users are the product). Focusing on one of those cookies, Google has managed to convince 65% of webmasters to use their Google Analytics service, which I hear is by far superior to even commercial competitors. Meanwhile, your browser is running around shouting a ton of metadata (“I run Firefox version 17.33223532, from IP 123.456.789.012, etc”). This means Google can theoretically track individual users around the web (even those who don’t use Google.com or Gmail), hopping from website to website, and segment individuals for advertising purposes based on the content of the browsed pages (or it could decide to use that data for other purposes in the future). I say “theoretically” because Google does not openly acknowledge they do it, but everyone at least understands they have that capacity.

Third party tracking in Europe

Tracking web users in this way would most likely be a violation of consent for European users. European data protection laws are not exactly structured to make this free-service-to-webmaster-so-we-can-datamine-individual-users setup easy. Let’s consider an intra-Europe case first. Data protection laws then identify three types of actors: the “data subject”, the “data controller” (the entity deciding what will happen with the personal data, in charge of getting consent), and the “data processor” (an entity providing a service to the controller). Here the service has to be contractually very straightforward, and the controller has many responsibilities towards the data subject, essentially making sure consent is not violated. One of those responsibilities is to provide a copy of their personal data to the subject who requests it, including all the data processed by third parties and the processed result. Of course, in practice, this right is very hard to exercise. Even in situations where the lack of compliance is clear, it might cost quite a bit of money to get your case in front of a judge, which will never be justified for an individual.

Transatlantic third party tracking

Now what happens when the subject stays in Europe, but the processor and/or the controller are in the United States? The Safe Harbor Privacy Principles then kick in. These are an uneasy marriage between transatlantic legal systems, both in terms of redress mechanisms and the harms they mean to prevent. In short: the US is reluctant to recognise many harms around privacy protection, but it is easier to get in front of a judge there (think class action), and actually free. Safe Harbor also gets around the thorny issue of defining “privacy” transatlantically by breaking down obligations of companies into seven atomic principles. It is worth insisting that while the framework agreement around Safe Harbor has recently encountered a rough patch, individual companies’ obligations carry on for a while longer (in Coursera’s case until January 6th 2017).

This means that the whole Coursera (controller) — Google Analytics (processor) setup legally puts a tremendous amount of responsibility on Coursera. For instance, Coursera is supposed to make sure that Google will not use that data for any other purpose beyond what has been permitted by Coursera (or at least have a contract limiting Google’s use of it to what the data subject has consented to). In Safe Harbor lingo, this is called the “Onward Transfer” principle, but it is extremely hard for the data subject to check this effectively. However, combining the “Onward Transfer” with the “Access” principles offers a tangible outcome: can Coursera get me a copy of my Google Analytics data?

Coursera is liable for getting Europeans who request it a copy of all their personal data that Google Analytics has stored on Coursera’s behalf.

Steps towards legal action

At the end of June 2016, with a reminder sent early July 2016, I have thus asked Coursera for access to that Google Analytics data. In this process, I have asked for assistance from PersonalData.IO, a service that helps trailblaze then automate such requests.

On July 25th 2016, I have received a response from Coursera’s lawyers (at MOFO.com), to which PersonalData.IO responded August 6th. I am curious to see how this develops, but suspect it will end up in front of an International Center for Dispute Resolution judge (arbitration).

Some additional thoughts

  • Universities are also recipients of some of Coursera’s data (for instance for grading, for generating certificates, or for research purposes). Sometimes that data is anonymized, sometimes pseudonymized, but manipulating such data will soon carry particular legal risks, exposing them to the obligation to explain much of the logic of the processing. Indeed, starting May 2018, a new law will come into effect, extending the reach of EU law and introducing what some have dubbed a “right to explanation”.
  • I am pursuing a similar process with 23andMe, the direct-to-consumer genetics testing company.
  • One could wonder why I am wasting the time and/or money of companies that aim to educate the world or help medicinal advances through genetics. In fact, it is precisely because of these admirable goals. It is not Coursera or nothing. It is not 23andMe or nothing. These services can evolve in privacy-conscious ways, or others can claim their place. It is certain that privacy always involves a tradeoff: what value do you personally get for the personal data you give away? While it might be easy to stop using shadowy apps that abuse your personal data, it will become much harder over time to stop using educational or health services, even if they otherwise abuse of these data relationships on the side. Particularly in the education context, some are already dependent on these digital services.

--

--

Paul-Olivier Dehaye
PersonalData.IO

Mathematician. Co-founder of PersonalData.IO. Free society by bridging ideas. #bigdata and its #ethics, citizen science