Cambridge Analytica and Facebook data
We present here some of the evidence that didn’t make it into a recent series of Guardian articles: 1, 2, 3, 4, 5 (full disclosure: the author of this piece is credited for additional research on 1 and 3 and has provided some help on the others).
PersonalData.IO helps individuals take back control of their personal data (follow us on Twitter, LinkedIn, Facebook or on our mailing list, or here on Medium). For over a year, we have been investigating Cambridge Analytica’s actions in the US election, the Brexit campaign and all over the world. We started by filing a ton of FOI requests with Cambridge University (sample, much more) to try to get to the bottom of the terms of the technology transfer. As we grew increasingly concerned with Cambridge Analytica’s practices, we asked in August 2016 the UK Information Commissioner’s Office if they were investigating them. The response was negative.
We started writing blog posts, helping individuals exercise their data protection rights, and sharing a large amount of material with journalists. This lead eventually to widespread coverage, with explicit credit (as researcher) to the author of this piece in Das Magazin, VICE and now the Guardian (twice).
The VICE piece, which followed closely the Das Magazin piece, included the following towards the end:
Kosinski is a Stanford University professor who developed a method to predict personality traits from Facebook Likes.
One of the Guardian pieces mentions a video relating to the use of Facebook Likes, and includes commentary by Cambridge Analytica and Facebook regarding that video and the general situation.
We will leave aside here numerous questions these two statements raise, even assuming they are fully true.
How was this dataset acquired? Was it bought? In anonymised form already? Scraped? Who anonymized it? How was consent obtained? Did Cambridge Analytica perform due diligence on this? Do they have Likes for a subset of the population? Is the model still used? How?
As for Facebook:
Are they knowledgeable about UK electoral rules? Knowledgeable on data protection issues when the processing is done for the purpose of political campaigning? Was any wrongdoing discovered with respect to some of Cambridge Analytica’s (subcontracted) work?
Let’s backtrack a bit, back to the video itself. We were actually pointed to that video by someone who must have been sympathetic to our previous work. We immediately shared it with the Guardian, but the video was taken down before they could download it. Fortunately, we had already grabbed it…
This video indeed describes the work done by an intern, in October-November 2015. What is described here is exactly Kosinski’s method. This is a direct contradiction with Cambridge Analytica’s denial for the VICE piece. The big question now is whether or not to trust Cambridge Analytica’s (partial) denial on the Guardian piece, or really any of their public claims, particularly given that they are now under investigation.
In addition, another Cambridge Analytica/SCL intern gave a separate talk at the same type of event in April 2015.
The first and last tweets give some indication on the method used there. With a good knowledge of the network graph, and some sample of known political affiliations, one can profile more widely the whole population. To be clear, this model might have been built also on an anonymized dataset obtained legally, but there would still be a problem in law in using it for practical purposes.
We have offered some constructive suggestion here.
This video was taken down when the Guardian asked questions about it as part of their reporting. I re-uploaded it, but got a takedown request through YouTube. I argued fair use in light of journalistic interest, and prevailed. This is documented here and here.
Thanks for reading! My name is Paul-Olivier Dehaye, and I am the co-founder of PersonalData.IO. I have written extensively about Cambridge Analytica. Follow us on Twitter, LinkedIn, Facebook, our mailing list or here on Medium.