Facebook and Cambridge Analytica Data Scandal

A look into the ethics, social repercussions, privacy issues and how — if — GDPR could have prevented it.

Luca Giorgi
36 min readFeb 13, 2020

--

Foreword

I have written this essay for one of my university courses; Since I have already given my final for this particular course, and that I received some interesting feedback from the Professor who judged it, I already wanted to share it with however many people were interested in reading it in order to gather more comments and possibly start fruitful discussions around the topics explored it. However, what prompted me to actually put this here and openly share my writing is the news that Ireland’s DPC blocked Facebook Dating from launching in the EU (HN discussion here), something that positively surprised me (for reasons that you’ll understand if you read the whole essay :).

One final tidbit that I wanted to add here is that from the feedback I already gathered I could slightly modify this text in order to possibly improve it — I have decided against this as I wanted to capture in raw form my thoughts on specific issues in a frozen point in time, something of a personal wayback machine for how my reasoning process worked when I wrote this. However, please feel free to share your own thoughts and feedback, because even if I do not ever amend this text I am truly interested in other opinions on the matter.

P.S.: If you want to read this in a version formatted according to the IEEE standard, feel free to check it out here

I. INTRODUCTION

In this report, I am going to explore the ramifications of the Facebook-Cambridge Analytica data scandal, in particular by looking at the underlying technical issues regarding data privacy and access inside Facebook that made the scandal possible, at the ethical aspects of the whole scandal and public opinion fallout that followed its disclosure, and the potential impact that the GDPR would’ve had on both companies before and after the scandal itself — i.e., would it have happened if GDPR was already enforced? And it if did happen anyway, what would have been the consequences for both the Data Controller and Processor?

Before delving into the analysis outlined in the previous paragraph, I believe that a small summary of what happened between Facebook and Cambridge Analytica is necessary:

Reports began surfacing in 2015 [1], 2016 [2] [3] and 2017 [4] that a new player in US political advertising, Cambridge Analytica, was collecting data on US voters through an app which abused Facebook’s OAuth login mechanism.

The scandal finally erupted in mainstream knowledge in early 2018, when both The New York Times [5] and The Guardian [6] published a news story simultaneously after having worked with a Cambridge Analytica whistleblower, Christopher Wylie, for more than a year.

The two articles did not contain any new information that wasn’t already public knowledge by this point (through earlier reporting), but it framed the story of the scandal on the fact that Cambridge Analytica had worked on Donald Trump’s 2016 election campaign and that Facebook knew of the collection of this data since at least late 2015 [6].

Thanks to the reports that came out in 2018, we now know that the data has been harvested by Aleksandr Kogan through an app called “This Is Your Digital Life”, which he sold to Cambridge Analytica via his own company, Global Science Research. He paid roughly 270.000 people on Amazon Mechanical Turk to take the personality quiz built into the app, and, more importantly, give the app permission to collect the user’s data. Thanks to Facebook’s loose restrictions on data collection, Kogan was also able to collect the data of all the users’ friends, which, by Facebook estimates, could total 87 million people. Interestingly, by Facebook’s own admission, the data collection was allowed by its own terms of service, but it then banned the app from gathering any more data because it became aware of Kogan reselling it to Cambridge Analytica.

In the following months and years there’s been much discussion about both the underlying issues with Facebook and the ethical issues stemming from such data collection, but something that has not really been talked about is the fact that there are dozens of companies like Cambridge Analytica operating on a day to day basis for political campaigns across the spectrum, so what is the difference between CA and all its competitors? Why did so much of the public opinion focus on this small company gathering data on voters, when, as we will see further down in the analysis, most people who have worked with them even discredited their work and said that it doesn’t perform as well as they claim?

I believe that the main issue in this whole situation is the massive number of people that might have been impacted by a single company through social media — this was the first time that the general public had to confront with the idea that all the information that they had been dumping online for years could be collected and, in the worst case scenario, weaponized against them. In this specific case, whether Cambridge Analytica actually succeeded in what it was trying to do is not important, because the mere threat of such an occurrence was enough to spark public outrage and force them out of business (even though it is now operating under a different name).

The other important factor in this scandal was timing: the reports came out when the media was already drumming up news stories about political interference by national entities through the use of targeted social media ads, and a huge number of people became more and more disillusioned with the original promise of social networks improving everyone’s life.

So, when it comes down to what happened, is this scandal as bad as it was portrayed in popular discourse? How did it come to be, and whose fault was it that it could happen in the first place? Finally, and maybe more importantly, is Cambridge Analytica the single black sheep ruthlessly exploiting personal data without any ethical qualms, or is it just the scapegoat that has been sacrificed on the altar of public opinion in order to let countless other companies operate in peace and keep racking in user data to sell to the highest bidder?

Furthermore, amidst all these musings about the ethics of collecting personal data in the era of social media, would the biggest change in legislation of our times regarding user ownership of their data, namely GDPR, have come in handy to stop all of this from happening? If so, will it protect us from ever happening again? If not, is it at least a step in the right direction which might have enforced culpability onto some of the protagonists of the scandal at hand?

II. Privacy Issues

The first aspect that I want to explore regarding the whole scandal is the underlying issues with Facebook’s systems and procedures that enabled Kogan to collect all this data.

I believe it is important to underline that, contrary to what a lot of reports were saying as soon as the scandal came to light, this is not a case where Facebook’s systems were breached in order to gather data. There was no hacking involved, and no loopholes were found in their processes in order to gather more data than usual.

Facebook used this reasoning as its first line of defense against the accusations that were being raised against them, but in my opinion the fact that a “bad” actor was able to gather so much data even without abusing or breaking into any system is even more damning and telling than if this whole scandal involved serious hacking on Kogan’s part.

To recap what happened in the lead-up to the scandal, Kogan developed an app, named “thisisyourdigitallife”, that used Facebook’s Login feature in order to gather data about its users, and, more importantly, their friend network — most of the users of the app were paid through Amazon’s Mechanical Turk service in order to take the personality quiz in the app, but it is possible and probable that a number of them were simple Facebook users who used the app of their own volition.

The way in which Facebook enticed developers to create and maintain apps and games for its platform (and justified the 30% cut it got on all in-app purchases) was by giving developers access to data it had collected on its users, primarily through Facebook Login, which was (and still is) an OAuth API developers could implement in their apps so that users could login without having to create a new account — thus lowering the barrier to entry to new services and apps. According to various reports, and to Facebook staff who weighed in on the matter at the time, Kogan used the feature as it was intended and developed to be used by Facebook itself — in fact, in the act of gathering data, Kogan didn’t break any laws nor went against the Terms of Service of the platform, rather he took full advantage of the possibilities that were given to him.

This meant that, even though he had less than 300.000 users take the survey and consent to their data being shared, he could gather data on between 50 to 90 million users by taking advantage of a feature in Facebook Login that enabled him to access the data of a user’s friends, with a level of granularity so high that eventually Cambridge Analytica could build their psychometric profiles, or, at the very least, link this additional information to their own in-house dataset for a more complete voter profile.

It is important to note that this functionality was live from 2007 to 2014 and that Kogan was by no means the only one taking advantage of it, and even when Facebook finally decided to police the functionality in a more rigorous way, in only decreased the level of granularity of the information developers could gather rather than cutting off their access completely.

In fact, as seen in [7], internally at Facebook this feature has been criticized at least since 2012 by employees at the managerial level, but to no avail. Furthermore, from this report we can learn that Facebook did not audit any of the developers to whom it gave data and had no way to control the data once it left its own servers — even when what is clearly a problem for user privacy was brought up to higher levels of management, nothing has been done simply because not knowing what was going on with the data was a better legal position for Facebook than trying to enforce any rules or regulations on all its partner developers, which could have been seen as an admission of culpability and an oversight in their roles as Data Controllers when a scandal eventually came to light.

It is easy to see, thanks to these revelations, that Facebook’s whole operation and business model revolved around data brokerage — the platform itself needed always more users and to keep them glued to the platform for as long as possible, and the way in which it did that was by giving those users things to do on its website other than just scroll through the News Feed. One of the main attractions on the website became the countless games and quiz-like apps that started popping up, like the infamous FarmVille, which boosted Facebook’s user retention numbers and gave them even more control over users’ data and usage patterns. However, developing and maintaining an healthy and interesting suite of apps is not free, and the way in which Facebook financed and enticed the developers to invest in its own platform was the access to the enormous amount of data that by that point the company had already amassed.

Even though the terms of service of the platform prohibited developers from then reselling this information they had obtained through the developer tools they were offering, the fact that almost no audits were conducted over the years meant that a number of bad actors, including even nation-backed groups, could have created a second-hand black market for user data collected through Facebook’s own tooling.

This feature in the end has been reworked and downsized only because internally at Facebook management started to realize that big developers who pushed multiple apps on the platform, or that had very successful ones, could be gathering so much data that they could, in theory, rebuild internally their own social graphs and come to compete in the same space that Facebook was operating in.

Something that is not clear is for how long developers could keep accessing the information once consent had been given, and if and how it got updated over time as users kept using Facebook. We know for a fact that right now apps that use Facebook Login’s function have a time limit (which is still in the months to years timeframe) in which they have access to the data, provided the user doesn’t update the consent it initially gave, but it is hard to find out whether in the time period we are talking about when Kogan was using-abusing the system there were any limitations in place to the data the developer could keep syphoning off of Facebook’s systems. This is one of the reasons why it is hard to estimate how many users could have been affected by this data scandal, and why the initial reports and Facebook’s own admission differ by more than 30 million users, I believe.

I think we can then logically come to the conclusion that the main issue which caused this whole debacle wasn’t human error tied to a computing system, or a case of high-level state-backed hacking in order to compromise secure system, but rather the sorry state of market capitalization of user data. We have to remember that Facebook became the behemoth it is today only through the accumulation of (sensitive) user data, something that has been referred to many times as “the new oil”, and that increasingly many companies are exploiting in new (and more terrifying) ways.

In the end it’s no wonder that someone took advantage of what the market had to offer in order to make a profit, something that most companies nowadays strive to do — Kogan in this situation just played by the rules laid out by unchecked capitalism and a legal framework that is still, to this day, not ready to deal with personal data ownership and companies trying to capitalize on it as much as possible.

In this context it is even hard to understand whether, if we were to use a metaphor and compare the current data market to the gold rush of the 19th century, Facebook is a very successful gold miner, or if it’s the one selling the tools and profiting over the mass hysteria of a market in search of new troves of data. In a way we are in a completely new paradigm which I believe is novel for everyone, where Facebook is both the controller of a massive amount of user data and the reseller of tools to access and gather that same data, which puts it in a position where it is profiting off of the data hype both because of the data it owns and the potential value that the market is attaching to it and because of the access that it sells (or could sell) to companies that would like to compete with them in the same data market.

Keeping in mind all of these issues, the causality of the scandal seems pretty clear and it stems from market hysteria and the value it places on data — whether or not Kogan was in it for research purposes or only to resell the data to Cambridge Analytica bears little importance, because the central focus of the whole operation is the perceived value of the data that was being accumulated; in the same fashion, whether or not Cambridge Analytica was able to use the data to accurately profile US voters is not really important. For all we know, they could’ve been selling snake oil to political candidates by using buzzwords and hype around data, but the simple fact that they had a massive dataset on US voters was incredibly valuable (or perceived to be) and that is the only thing that matters.

Finally, if we are to come full circle, we have to also understand that the public outrage that was caused by this situation was tied more to the perceived value that data started having rather than a feeling of being robbed of personal data from the users’ perspective. Up until this point, end even after the scandal blew over, a huge majority of internet users did not care for their privacy or user data — most of them do not even realize that when a service is free they are the product that is being sold in the background — but the fact that one of the focuses of the story was the amount of money tied to all the data that was being accumulated and sold helped the public opinion realize that something intangible that they were producing by just using the internet could have huge monetary value for companies that they didn’t even know existed.

In a way this is something that could be expected given the context in which this whole ordeal happened, which is an hyper-capitalistic society which attaches value to anything and everything, and it could be a glimpse into how we might finally make people understand that their own data has to be protected from companies and bad actors trying to gather it — attach a monetary value to their every bit of information, and when it is eventually scooped up by ruthless companies or, even worse, stolen in a data breach, people will start feeling like they have been robbed of something of value, and maybe they will then become more interested in the movement trying to make personal data something to be protected rather than exploited.

It is clear that, as of right now, and back in 2013–2014, Facebook could not be trusted to be a steward of users’ rights and privacy, since their only incentive, as given by market capitalization and stock trading, was simply to turn a profit on the massive amount of data it was gathering by offering free services to people.

This brings us to a different line of reasoning and considerations, which skew more towards the ethicality of the whole operation Facebook is running (and the ethics of both Kogan and Cambridge Analytica) rather than innate flaws of Facebook’s systems.

III. Ethical Issues

Having determined that Kogan didn’t really break any laws or rules in collecting user data through his own app, we have to consider if what he has done (and what Cambridge Analytica eventually did with the data it bought from him) can be considered ethical.

The first consideration we have to make is that Kogan paid a lot of the users who used the app in order to gather their data, hence in my opinion he was as “ethical” as a data broker could be in a case like this, granted that the fact that information was being collected was buried in Facebook’s Terms of Service rather than on Kogan’s app.

What was unethical, on his part, was mainly two things:

· Collecting information of all the friends of the users who used his app, paid or not

· Reselling the whole dataset to another company, and arguably setting up his company and the app itself just for this purpose

Regarding the first point, one could argue, as we have explored in the previous paragraph, that he did nothing more than what was available to him through standard tooling; the fact that he claimed that the data would be collected for research purposes but was instead later sold to another company for political advertising is certainly unethical and shady, but at the same time once someone gives up control on his or her personal data it is clear that there’s no stopping the spread of this information, for whatever purpose the seller or buyer deem appropriate.

This ties to the second issue here, which is that Kogan broke the Terms of Service of Facebook and Amazon by selling the data that it was gathering through his quiz. This is the central issue that the legal framework has focused on, since it is the only action that implies culpability on both Kogan and Cambridge Analytica, while at the same time absolving Facebook of any wrongdoings.

This being the crux of the problem, it is worth exploring the issue a little bit more in depth; in particular, while it is true that Facebook set up its own tools and rules such that reselling of user data was not permitted, as we have already seen, no actual audits were done on what developers were doing with this data, and I believe that if not for the public outcry over Cambridge Analytica, Kogan would’ve been just one of the many developers who have used Facebook’s systems in a way in which they could profit off of the data and get away with it scot-free. In fact, the issue here lies with the fact that even though the Terms of Service were written in a way that prohibited this behavior, how are they supposed to stop a bad actor from accumulating and reselling data?

Once data left Facebook’s servers it is out of their control, and the only purpose that the Terms of Service can realistically serve is protection in case of legal issues (such as this one), where Facebook can deny any faults on their part and instead shift the blame on developers and companies breaking the Terms of Service after the deed has been done.

Even if Facebook took the harshest and biggest step it could against the developer, i.e. banning them from the platform, what has been accomplished? Surely the company itself is now in a better position legally having taken action against someone acting unethically (or illegally), but by this point the developer-app-company in question could have already amassed huge amounts of data, which have been spread to other companies and are now out in the wild, waiting to be exploited by the highest bidder.

It becomes clear then, even though Kogan has his own fair share of faults and has to be held accountable for his actions, the main problem becomes once again Facebook and their whole business model, which is the originator of these issues and the most unethical portion of the whole story — and again, Facebook is not the only company doing this, just like Kogan is not the only developer reselling data and Cambridge Analytica is not the only company using data to profile people and target them for different purposes.

Many companies offer an OAuth service, most notably Google, Twitter, Linkedin, Github and more recently Apple, which trade convenience for user data — sure, you can get onboarded on a new platform or app in seconds if you have an account on any of these platforms or with any of these companies, but what are you giving up for not wanting to fill out a form and remember a new password?

It is possible that different companies restrict the sharing of personal data in a more stringent way than Facebook, especially after the scandal we are examining, and I am unsure whether Apple provides access to any data other than what is strictly necessary for an account, i.e. an ID or email, but the principle on which OAuth works — which, let’s remember, is a standard implementation that companies came together to create — is a trade between convenience and privacy.

What starts to emerge from this analysis is that even though what happened between Facebook and Cambridge Analytica has been the spotlight and the catalyst for a bigger reckoning on how we handle data, the problem lies in the system itself rather than these two companies and a researcher that took advantage of the situation; Every service on the internet that is given to you for free has to make money out of something, and in all cases it ends up being you and your data.

Whether that data is used only in-house to better target ads on the pages you are visiting is not important, because the whole economy of the internet now runs on the premise that whatever you do you are generating data, and that that data is being harvested by someone ready to either sell it or exploit it themselves.

As we have seen in this case, Kogan was just a cog in the system that served as a middle-man between Facebook and Cambridge Analytica, even though he also ran his own experiments and profiling with an OCEAN personality quiz — we are not sure if CA actually used the profiling that Kogan was creating through his own quiz or if they were just interested in the raw data, but that doesn’t seem an important aspect of the story.

Something we have not focused on up until now is if what Cambridge Analytica has done, namely buying user data, was ethical or not, and I am inclined to say that as far as their own business model goes they did nothing more than what other countless companies do on a day to day basis, which is collect data from data brokers and accumulate it in-house in order to run their own profiling algorithms in order to better target segments of the populace.

This, at the time of reporting, seemed the biggest issue that came to light when the story broke, but by examining the scandal in an analytical manner, we can safely say that Cambridge Analytica was not doing anything new, or, for that matter, anything that all the other companies involved in political campaigning all over the world were (and still are) doing.

It is telling that just a few years prior to the whole Cambridge Analytica scandal, the social media campaign run by Obama’s digital team was praised in technology circles because it was the first occurrence of a politician leveraging the power of the internet and social media to sway public opinion and win an election.

In hindsight we can safely say that all these compliments were quite short-sighted, since it is now clear that where there is opportunity to leverage the power of social media for “good”, or at the very least to sway the public in a direction that was socially accepted, the same opportunity exists for any bad actor that wants to exploit it. In this case, was Kogan, Cambridge Analytica, or Donald Trump’s digital team a bad actor? It is hard to say, but the public opinion seems to have given them this label simply because it is perceived to have been a case in which the power afforded by social media was leveraged for “evil”, as opposed to Obama’s case.

Now, it is still important to underline the fact that Cambridge Analytica was already in the news because it was being investigated for alleged Russian interference in that same election, so saying that they were bad actors is not completely unfounded, but since nothing came out of the investigation into the alleged interference it seems more a matter of public opinion on the candidate that leveraged social media rather than actual guilt of the people involved.

When talking about the ethics of what Cambridge Analytica has done, then, it is important to differentiate between how it acquired (some) of the data it had on US voters, and what it was doing with that data. A lot of reporting went into the ramifications of using data science and machine learning to target users and sway their votes in the 2016 US presidential elections, but we have now understood that this is nothing new, in fact there are countless companies doing the same all over the world, and even the old team behind Cambridge Analytica is now working again for Donald Trump’s 2020 campaign under the name Data Propria [8].

In fact, if we were to look for the keywords “Political Data Science” on the job board “Indeed”, we would get more that 15 thousand hits at the time of writing.

This is not something that is confined to politics either, but the whole field of consumer marketing has shifted towards Data Science (and more recently Machine Learning) in order to profile people and better target different segments of the market.

It is not uncommon to see targeted ads for any kind of service or product when surfing the internet, it is almost expected to see ads that are tied to our interests on any page that we visit, and since the advent of assistant integration between smartphones, PCs and any internet search we perform, it is not uncommon to get notifications from our devices recommending news stories that we might be interested in, or updates on traffic for places that we might want to visit during the day based on our searches.

This is not to speak of the case in which we book something online with an account that is tied to more than one service, in which case contextual data is given to use at every step of the way to make the process more personalized and help us get to the place where we want to be in time, or remember to perform additional actions that we might otherwise forget to do.

It is clear, then, that user targeting based on their personal data is nowadays something that is common, and in some cases even expected by the users themselves, so why should political campaigning be any different? Why should a political candidate or party not take advantage of the same troves of data that every other company on the market makes use of in order to better target its potential voters? Why shouldn’t they be able to, for example, show personalized media content to a segment of the population about an issue that that same segment might be interested in, like an information piece about a pension reform to older folk, or funding for higher education to younger voters?

It is hard to argue that this kind of targeting would be out of place in this day and age where our data is exploited by every single entity in order to better target us, but at the same time we need to be wary of the fact that where there is an opportunity to leverage data for good there is the same opportunity to leverage it for evil. What if, instead of a political candidate campaigning about his or her own proposals, there is a state-backed smear campaign against one of the candidates targeting users who might be more gullible or more prone to see that candidate in a negative light?

This is not a simple hypothetical question, since it was one of the main points of contention in the Cambridge Analytica scandal, where the alleged Russian interference played a role in both electing Clinton as the candidate for the Democratic Party over Bernie Sanders, electing Donald Trump as the candidate for the Republican Party, and ultimately discrediting Hillary Clinton with a wave of “fake news” targeted through Facebook’s ad platform in order to elect Donald Trump as president of the United States.

The fact that such a sequence of events, which sounds like the plot of a movie about the dystopian future we might live in, is an actual possibility and might have really happened, is in itself something that should make us wary of the power of data in this day and age, especially when it comes to a sphere that affects a whole nation, if not the whole world.

We will probably never know if Russian interference played a critical role in the election of Donald Trump, just as we will never know if Cambridge Analytica’s profiling of US voters was as powerful as they wanted their potential buyers to believe or if it was snake oil hyped up to be the holy grail of political data science, but at the very least we should be able to learn the lesson that our data could have a huge impact in systems and processes that govern our daily lives, and that we should be the first stewards of its protection.

Unfortunately, it seems like nothing has changed after this debacle; even though there has been demand for Facebook to become better, it seems like little or no changes have been implemented — in fact, the only changes that Facebook made were forced onto them by the European Union and its GDPR legislation.

They might have been fined, most notably for 5 billion dollars by the FTC in the summer of 2019 [9], but we are still talking about a company with a market cap of 575 billion dollars at the time of writing.

They might even have had a bad quarter after the reports came out, but upon closer inspection we notice that it is a quarter after the holiday period, which is historically lower than the previous one, and in the context of Facebook’s history was the second highest such quarter.

Kogan might have become disgraced in the public eye, and left Cambridge University, but is now the CEO of a new big data analytics firm, Philometrics, which aims to merge the power of Machine Learning with the responses usually obtained from surveys in order to infer the distribution of the responses on a larger population than the one that originally took the survey. And, as we have seen, Cambridge Analytica might have shut down, but is now working once again for Donald Trump’s campaign, simply under a different name, Data Propria.

Collecting all this information in a single spot paint a bleak picture of our society and how much it values its own data, but we might see a light at the end of the tunnel; This small sliver of hope is given to us by the European Union and GDPR, which is a legislative framework aiming to deal with data ownership, with the explicit purpose of protecting users’ rights and putting them above the profit-seeking companies. Unfortunately, it came into effect after the scandal took place, meaning that it cannot be used to make any of the people involved pay back what they have earned in this whole situation.

However, in a What-If scenario, we are going to explore what would have happened if GDPR already existed at the time of the data scandal and how it might have affected it, both prior to it happening as a prevention mechanism, and after it happened as a framework to be relied upon in order to punish bad actors who seek to exploit our personal data.

IV. Potential GDPR Impact

When looking at the hypothetical scenario of GDPR already in effect before or during the Facebook/Cambridge Analytica data scandal, we must consider two different points in time for a complete analysis of what could have happened.

The first option would be that the GDPR was already in effect before 2013/2014, when Kogan started collecting data from Facebook through his app, and hence explore what effect the GDPR could’ve had as a prevention mechanism to what followed.

The second scenario which is interesting to explore is instead the case in which GDPR came into effect during the period in which Kogan was already collecting data and selling it, and thus looking at the legislation as a punitive framework rather than as a safeguard for user data, which was already being collected and exploited.

In our first possibility, we need to consider if Facebook’s business model would have stayed the same with respect to using access to user data to entice developers to put apps on their platform. Under the GDPR, Facebook the company would be regarded as the Data Controller for all the pieces of information that Kogan was able to collect, but what is more important is that Facebook would’ve had to specify in its own Terms of Services for which purposes it was collecting this data, and use it only for the specified purposes written therein.

Especially according to articles 5(1)(a) and (b), 6 and 14, which state that personal data must be:

· processed lawfully, fairly and in a transparent manner (Article 5(1)(a)); and

· collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes (Article 5(1)(b)).

· personal data may only be processed where a data subject has given consent to the processing of its data for one or more specific purposes (Article 6(1)(a)) (or where one of the other grounds of lawful processing set out in Article 6(1) applies); and

· detailed information must be provided to a data subject where personal data is collected from someone other than the data subject, including details of the controller, the Data Protection Officer’s contact details, the purposes of processing, the recipients of the data, details of international transfers, and the right to be forgotten and to restrict processing (Article 14(1)).

Facebook could not have “sold”, or otherwise granted access, to user data to third-party developers for processing that wasn’t already consented to by the users themselves.

What’s more, GDPR sets an high standard for consent from the users, such that it requires a positive opt-in (meaning that default consent is not allowed) and a clear and specific statement of consent for each kind of processing that data might go through — this means that the Facebook Login feature, which was exploited by Kogan in order to collect data on almost 90 million users, could not have been engineered by Facebook in the same way that it has in our case, where Users where opted-in automatically to sharing data through their Friend List connections with an high level of granularity.

GDPR also requires that when giving consent to any kind of processing of data, the Data Subject needs to be informed of which controller(s) will get access to this data and for which purposes — this would’ve put a dent in Facebook’s strategy to give access to user data to third-party developers, since it would’ve needed to specify in its consent forms to whom it would’ve given access to the data it was collecting.

Assuming that this information has to be given contextually to the consent being given by the user, it seems improbable that Facebook could’ve shared user data with any third-party developer on its platform, but it would’ve rather needed to specify before hand who could get access to it — also assuming that this list could not change retroactively, it seems impossible for new developers to get data of users who gave their consent to Facebook for data processing before the developer was added to the platform.

In this same scenario, we also need to consider Kogan’s line of action when it came to his own app, “thisisyourdigitallife”.

Assuming that he could legally acquire the data of users from Facebook, both because it was laid out in the consent form that user data could be shared with Kogan, and because all the users involved gave explicit consent to their data being shared with him, there is still the problem of data minimization and the clear purpose of its processing.

While it is true that Kogan paid most of the users of his app, and that even those who were not paid accepted to share their own responses to the quiz by using the app, the purpose for which this data was being collected was supposed to be “research”, which is clearly not the same as political targeted advertising.

Furthermore, even if Kogan really did collect the data for himself and his research at Cambridge University, he would’ve needed to inform users that the data would’ve then been sold to Cambridge Analytica or he would have been in clear violation of GDPR.

At the end of the chain of data exploitation we then find Cambridge Analytica itself, which bought the data off Kogan and used it for its political campaigning efforts. It is hard to argue whether or not GDPR would’ve had any impact on CA with respect to Trump’s 2016 election campaign, since in this specific case the company itself was set up in the US and was targeting specifically US voters; However, there are also report of Cambridge Analytica being involved in the Brexit Leave campaign in the UK, which would’ve made it subject to GDPR rules under Article 3, which states:

· This Regulation applies to the processing of personal data in the context of the activities of an establishment of a controller or a processor in the Union, regardless of whether the processing takes place in the Union or not.

Whether or not any of the data collected by Kogan was used for the purpose of the Brexit campaign is unknown, hence it is impossible with the information available to say if GDPR could’ve been enforced against Cambridge Analytica.

Something else to consider is that GDPR doesn’t seem to place any explicit limitations on buyers of personal data, whether or not the legislation applies to them; One could argue that consent would need to be acquired again after the transaction between the original data controller and the buyer was completed, but no explicit rules seem to be given in this regard in the GDPR itself.

Moving on to the second scenario we envisioned at the start of the paragraph, we need to consider GDPR as a purely punitive framework with regards to this data scandal.

First, it is important to clarify what measures GDPR envisions against non-compliant Data Controllers, and that is, according to articles 58 and 83:

· four levels of sanctions, i.e.:

o a warning

o a reprimand

o the suspension of data processing

o a fine

In the case of the fine, there are two different levels:

· A fine of €10 million or 2% of the annual global turnover of the preceding financial year (whichever figure is higher).

· A fine of €20 million or 4% of annual global turnover of the preceding financial year (whichever figure is higher).

Furthermore, as stated in article 82:

· Any person who has suffered material or non-material damage as a result of an infringement of this Regulation shall have the right to receive compensation from the controller or processor for the damage suffered.

· Any controller involved in processing shall be liable for the damage caused by processing which infringes this Regulation. A processor shall be liable for the damage caused by processing only where it has not complied with obligations of this Regulation specifically directed to processors or where it has acted outside or contrary to lawful instructions of the controller.

Now, if the highest level of sanctions had been given to each of the controllers involved (i.e., Facebook, Kogan and Cambridge Analytica) we would’ve seen the second level of fines being leveraged against all of them, where in the case of Facebook this would’ve meant a fine amounting to 4% of its global turnover for 2017, which is roughly €1.6 billion, and the €20 million fine for both Kogan and Cambridge Analytica.

Furthermore, all the Data Subjects could’ve sought both material and immaterial damages from all three companies — in this case the GDPR doesn’t specify the monetary amount that the Data Subjects could’ve sued for, but we can see how these fines could’ve easily become much bigger than they are laid out in the GDPR.

In Facebook’s case, for example, we can assume that each user could’ve asked for damages for at least the actual worth of their data with respect to Facebook’s financial statements. This means that, looking at Facebook user base in Q4 2017, which was of 2.130.000.000, and its revenue from advertising in that same fiscal year, which amounted to roughly $40 billion, each user’s data is worth around $19. Multiplying this number by the 87 million users impacted, we get that Facebook would’ve had to pay its users another €1.6 billion in just material damages.

While this might seem like harsh punishment for the mishandling of user data and a just compensation for everyone involved, the sad reality is that even in the case in which Facebook was asked to pay all this money to both the EU and the Data Subjects, the total amount would still be lower than the fine that the FTC gave to the company in the US ($5 billion), and in the case of both Kogan and Cambridge Analytica the companies would’ve filed for bankruptcy straight after being involved in the scandal, thus making the fines themselves and the damages sought by the Data Subjects virtually worthless.

Where does this leave us? Is the GDPR as effective as one would’ve hoped for? Surely in this made-up scenario the legislation itself might have been more useful as a prevention mechanism, where it could have enforced stricter rules around the handling of user data and prevented some of the problems that came up during the scandal; however, it appears clear that as a punishment tool it might be ineffective at best, especially against multinational corporations that profit by massive amounts of money by exploiting user data.

Moving to the real world that we live in, we can see the effects that GDPR is having on both Cambridge Analytica and Facebook, and the results do not bode well for Data Subjects.

Specifically, Cambridge Analytica (or rather, its parent company SCL Elections) “pleaded guilty to ignoring a data enforcement notice after it refused to hand over data it held on US academic Professor David Carroll”[10], and was fined for only £21,000 — which is telling of how much more companies value their control over data, or their non-compliance with GDPR, when they prefer paying a fine rather than allowing users access to their own data.

In the case of Facebook, instead, we have seen a push to make the public believe that the company is now GDPR-compliant, by giving users the possibility to download the data that they have explicitly uploaded to Facebook from their control panel. However, this is not all of the data that the company possess on each user, since there is a lot more accessory data that Facebook collects and links to each profile, whether it is through device fingerprinting, Facebook Login across the web and numerous apps, or through the “Like” widgets present on almost every web page in the last 10 years.

What’s more, user who might want access to the full amount of data that Facebook has on them are being completely ignored by the company and are stuck in a limbo, as is the case for Ruben Verborgh, who is keeping a blog of his misadventures with Facebook and GDPR compliance[11].

V. Conclusion

As we have seen throughout this small analysis of the data scandal between Facebook and Cambridge Analytica (with the help of Kogan and his quiz app), there are numerous pain points all around the story that made this situation not only possible but a reality.

We know that Facebook’s systems weren’t secure in a broader sense of the word, since by design they allowed access to user data by third-party developers that put their apps and games on the platform — what’s more, they were encouraged and enticed to develop for Facebook’s platform through the promise of access to user data, which is also how the company justified its 30% cut on in-app purchases, and what this tells us is that this whole ecosystem was built from the ground up with the exploitation of user data at the front and center of it all.

Kogan, whom might or might not have acted in good faith when collecting the data and reselling it to Cambridge Analytica, simply took advantage of the possibilities that were offered to him, something that countless other researchers and companies do (or whish they could do) on a daily basis. It is important to remember that his actions are not an unicum, and that to this day the personal data of all of us it being used as a bargaining chip among different companies and being exploited for all kinds of purposes, among which is still consumer targeting for marketing purposes and political campaigning across the whole spectrum of political leanings.

Cambridge Analytica itself kept amassing data for its own operations not because it dreamt of world domination, but simply because it is the most sensible option for a Data Science company in this day and age, where the hype surrounding Machine Learning and personalization of services is fueled by huge troves of data and the potential value attached to it by the market.

Whether they are the data wizards they claim to be, or a sham operation running on hype as some in US politics have hinted at [12], is not really the point of the whole discussion. The fact remains that they have been able to gather data on almost 90 million users and possibly link it to other caches of data that they already owned or otherwise acquired, creating one of the biggest databases known to the public containing specific information about the political leanings of these users — the simple existence of such a data set, and the possibility to rebuild it for other companies (or nations willing to interfere in the democratic processes of other countries) is enough of a threat to our everyday lives to warrant further examination and, more importantly, clear and sound legislation around data handling and ownership.

Even in a world where everything that these companies have done could be framed as legal, there still remains an ethical question to be pondered, which leaves us with a bad aftertaste when realizing that this whole situation is not a product of specific companies or people misbehaving, but rather a direct consequence of how the system itself is built to operate.

As much as we can examine the actions of all the actors involved in this scandal, there is a huge number of people working behind the scenes in the same exact way as they have done, and what’s worse being praised for it when public opinion deems it acceptable.

The main issue seems to become then the system in which we are all operating, namely the capitalistic one that we have been led to believe is the only one able to sustain our lifestyles, and the way in which it attaches monetary value to every facet of our lives, and then seeks ways to profit off of it in a game of make-believe among companies and investors.

In this whole mess the political bodies of the world have started taking notice that something is not working and that it could short-circuit our whole society, and have started proposing different ways of dealing with such a problem.

Most notably we have the European Union enforcing GDPR on all companies operating in the EU or targeting their operations at EU nationals, but even in this case we have seen that when used as a punitive framework against misbehaving companies it is ineffective at best. It is true that in the hypothetical scenario we have explored it could’ve served its purpose as a prevention mechanism against all of this happening, but this runs on the assumption that companies would willingly start to enforce GDPR on themselves before they get caught red-handed by legislators.

This assumption is quite strong, and in fact we have seen how these companies prefer being fined after the fact rather than comply with the legislation, or, in Facebook’s case, they seem to ignore it completely and warp its meaning to what suits them best in what seems like a pure PR exercise, backed by huge and well paid legal teams ready to defend the company’s interests.

In conclusion, my opinion is that this scandal is a logical consequence of the economic system we force ourselves to live in, and it will be just the first of many (which will soon become normalized, just how every other abuse of our data has been normalized over the past 10 to 20 years).

GDPR is a first step in the right direction, but it seems like it won’t be enough to really safeguard users’ rights and privacy; What would need to be done, in my opinion, is to consider the whole context in which data is abused and work to frame its ownership in a way that is coherent with the system at hand; that is, users, or Data Subjects in GDPR speak, should be regarded as the sole owners of all data they generate when using any computing device. They would then be able to assign a monetary value themselves to the data they legally own, and license it to companies in exchange for goods and services, with strong legislation around how this data must not be shared from one company to the other, but is rather tied to the explicit decisions of its owners and the companies it decides to conduct business with.

Unfortunately it doesn’t seem a solution that any political figure is currently pushing for, and I believe that the lobbying and spending power of ad networks and data brokers would stop any such attempt to put power back in the users’ hands, but I personally think that anything short of either a complete overhaul of the economic system that we live in or a massive global push for a shift in power between users and companies will not be enough to safeguard our rights with respect to our data going forward.

References

[1] H. Davies, “Ted Cruz using firm that harvested data on millions of unwitting Facebook users,” [Online]. Available: https://www.theguardian.com/us-news/2015/dec/11/senator-ted-cruz-president-campaign-facebook-user-data.

[2] H. Grassegger and M. Krogerus, “Das Magazin,” [Online]. Available: https://www.dasmagazin.ch/2016/12/03/ich-habe-nur-gezeigt-dass-es-die-bombe-gibt/.

[3] H. Grassegger and M. Krogerus, “Vice,” [Online]. Available: https://www.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win.

[4] M. Schwartz, “The Intercept,” [Online]. Available: https://theintercept.com/2017/03/30/facebook-failed-to-protect-30-million-users-from-having-their-data-harvested-by-trump-campaign-affiliate/.

[5] M. ROSENBERG, N. CONFESSORE and C. CADWALLADR, “The New York Times,” [Online]. Available: https://web.archive.org/web/20180317131953/https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html.

[6] C. Cadwalladr and E. Graham-Harrison, “The Guardian,” [Online]. Available: https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election.

[7] Lewis, Paul, “The Guardian” [Online]. Available: https://www.theguardian.com/news/2018/mar/20/facebook-data-cambridge-analytica-sandy-parakilas

[8] HORWITZ, Jeff, “Associated Press” [Online]. Available: https://apnews.com/96928216bdc341ada659447973a688e4/AP:-Trump-2020-working-with-ex-Cambridge-Analytica-staffers

[9] Carrie Wong, Julia, “The Guardian” [Online]. Available: https://www.theguardian.com/technology/2019/jul/12/facebook-fine-ftc-privacy-violations

[10] Field, Matthew, “The Telegraph” [Online]. Available: https://www.telegraph.co.uk/technology/2019/01/09/cambridge-analytica-owner-fined-21000-failing-obey-ico-ruling/

[11] Verborgh, Ruben, [Online]. Available: https://ruben.verborgh.org/facebook/

[12] Kaye, Kate, “AdAge” [Online]. Available: https://adage.com/article/campaign-trail/cambridge-analytica-toast/305439

[13] https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

[14] https://gdpr-info.eu/

[15] https://investor.fb.com/investor-news/press-release-details/2018/facebook-reports-fourth-quarter-and-full-year-2017-results/default.aspx

[16] https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram

[17] https://www.vox.com/policy-and-politics/2018/3/21/17141428/cambridge-analytica-trump-russia-mueller

[18] https://www.vox.com/2018/3/20/17138756/facebook-data-breach-cambridge-analytica-explained

[19] https://www.vox.com/2018/3/17/17134072/facebook-cambridge-analytica-trump-explained-user-data

[20] https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_data_scandal

[21] https://www.theguardian.com/technology/2019/mar/17/the-cambridge-analytica-scandal-changed-the-world-but-it-didnt-change-facebook

[22] https://en.wikipedia.org/wiki/SCL_Group

[23] https://en.wikipedia.org/wiki/Cambridge_Analytica

[24] https://www.bbc.com/news/technology-46822439

--

--