Cambridge Analytica and Facebook data

We present here some of the evidence that didn’t make it into a recent series of Guardian articles: 1, 2, 3, 4, 5 (full disclosure: the author of this piece is credited for additional research on 1 and 3 and has provided some help on the others).

SCL/Cambridge Analytica employee and aspiring stand up artist Eyal Kazin does a comedy routine at the SCL Elections Summer party (Wimbledon Dog Races, June 26th 2015, standing left with a hat). The superprint ‘SCL “We Rigg” Elections’ is original and added by him in form of transcript. The two standing persons closest to him are Alexander Nix, CEO at Cambridge Analytica and Director at SCL Group, and (most likely) Brittany Kaiser, Business Development Director at Cambridge Analytica and the SCL Group. The video was obtained from Kazin’s YouTube account. Although the video stayed up for over 18 months, it was taken down within the past month. Backup copy available on demand.

PersonalData.IO helps individuals take back control of their personal data (follow us on Twitter, LinkedIn, Facebook or on our mailing list, or here on Medium). For over a year, we have been investigating Cambridge Analytica’s actions in the US election, the Brexit campaign and all over the world. We started by filing a ton of FOI requests with Cambridge University (sample, much more) to try to get to the bottom of the terms of the technology transfer. As we grew increasingly concerned with Cambridge Analytica’s practices, we asked in August 2016 the UK Information Commissioner’s Office if they were investigating them. The response was negative.

The ICO was not investigating Cambridge Analytica or SCL Group in August 2016. Since we were not affected individually, we didn’t think a complaint would have led to much, and instead focused on contacting journalists.

We started writing blog posts, helping individuals exercise their data protection rights, and sharing a large amount of material with journalists. This lead eventually to widespread coverage, with explicit credit (as researcher) to the author of this piece in Das Magazin, VICE and now the Guardian (twice).

The VICE piece, which followed closely the Das Magazin piece, included the following towards the end:

From the end of the VICE article, referencing the Das Magazin article.

Kosinski is a Stanford University professor who developed a method to predict personality traits from Facebook Likes.

One of the Guardian pieces mentions a video relating to the use of Facebook Likes, and includes commentary by Cambridge Analytica and Facebook regarding that video and the general situation.

We will leave aside here numerous questions these two statements raise, even assuming they are fully true.

How was this dataset acquired? Was it bought? In anonymised form already? Scraped? Who anonymized it? How was consent obtained? Did Cambridge Analytica perform due diligence on this? Do they have Likes for a subset of the population? Is the model still used? How?

As for Facebook:

Are they knowledgeable about UK electoral rules? Knowledgeable on data protection issues when the processing is done for the purpose of political campaigning? Was any wrongdoing discovered with respect to some of Cambridge Analytica’s (subcontracted) work?

Let’s backtrack a bit, back to the video itself. We were actually pointed to that video by someone who must have been sympathetic to our previous work. We immediately shared it with the Guardian, but the video was taken down before they could download it. Fortunately, we had already grabbed it…

This video indeed describes the work done by an intern, in October-November 2015. What is described here is exactly Kosinski’s method. This is a direct contradiction with Cambridge Analytica’s denial for the VICE piece. The big question now is whether or not to trust Cambridge Analytica’s (partial) denial on the Guardian piece, or really any of their public claims, particularly given that they are now under investigation.

Second technique

In addition, another Cambridge Analytica/SCL intern gave a separate talk at the same type of event in April 2015.

http://archive.is/pbyF4
http://archive.is/4Bh3s
http://archive.is/U4Q9A
http://archive.is/3TneC
http://archive.is/fV5ns

The first and last tweets give some indication on the method used there. With a good knowledge of the network graph, and some sample of known political affiliations, one can profile more widely the whole population. To be clear, this model might have been built also on an anonymized dataset obtained legally, but there would still be a problem in law in using it for practical purposes.

What now?

We have offered some constructive suggestion here.

Update

This video was taken down when the Guardian asked questions about it as part of their reporting. I re-uploaded it, but got a takedown request through YouTube. I argued fair use in light of journalistic interest, and prevailed. This is documented here and here.

Thanks for reading! My name is Paul-Olivier Dehaye, and I am the co-founder of PersonalData.IO. I have written extensively about Cambridge Analytica. Follow us on Twitter, LinkedIn, Facebook, our mailing list or here on Medium.