How Big Data Won the Election
The following article has been officially translated from an article in Das Magazin (Germany’s New Yorker). Antinote Zine, posted the original translation due to the poignancy of the topic. However, as the story gained traction, the original authors provided an official translation (Below). It’s an interesting take on how big data played a role in the 2016 Presidential election. Specifically how #Twitler #punintentionally #Trumped overall using a medium that he commonly denounces, science. I have copied the official translation below to allow for simple consumption of some interesting content (The original site was getting cluttered, although it is hot linked above).
Should the original authors wish my removal of the content, I’d be happy to do so upon request. Happy reading.
“I did not build the bomb. I only showed that it exists.”
The strange connection between psychologist Michal Kosinski and Donald Trump’s victory
By Hannes Grassegger and Mikael Krogerus (*)
On 9 November at around 8.30 a.m., Michal Kosinski woke up in the Hotel Sunnehus in Zurich. The 34-year old researcher had come to give a lecture at the Swiss Federal Institute of Technology (ETH) about the dangers of Big Data and the digital revolution. Kosinski gives regular lectures on this topic all over the world. He is a leading expert in psychometrics, a data-driven sub-branch of psychology. When he turned on the TV that morning, he saw that the bombshell had exploded: contrary to forecasts by all leading statisticians, Donald J. Trump had been elected President of the United States.
For a long time, Kosinski watched the Trump victory celebrations and the results coming in from each state. He had a hunch that the outcome of the election might have something to do with his research. Finally, he took a deep breath and turned off the TV.
On the same day, a then little known British company based in London sent out a press release: “We are thrilled that our revolutionary approach to data-driven communication has played such an integral part in President-elect Trump’s extraordinary win,” a certain Alexander James Ashburner Nix was quoted as saying. Nix is British, 41 years old, and CEO of Cambridge Analytica. He is always immaculately turned out in tailor-made suits and designer glasses, with his wavy blonde hair combed back from his forehead.
Of these three players — reflective Kosinski, carefully groomed Nix and grinning Trump — one of them enabled the digital revolution, one of them executed it and one of them benefited from it.
How dangerous is Big Data?
Anyone who has not spent the last five years living on another planet will be familiar with the term Big Data. Big Data means that everything we do, both on and offline, leaves digital traces. Every purchase we make with our cards, every search we type into Google, every movement we make when our mobile phone is in our pocket, every “like” is stored. Especially every “like”. For a long time, it was not entirely clear what use this data could have — except, perhaps, that we might find adverts for blood pressure on our monitors just after we’ve Googled “reduce blood pressure”. It was also unclear whether Big Data would endanger or benefit humanity. On 9 November the answer became clear. The company behind Trump’s online campaign, as well as behind the Brexit campaign, was a Big Data company: Cambridge Analytica, whose CEO is Alexander Nix.
To understand the outcome of the election — and what might hit Europe in the coming months — we have to begin with a strange incident at Cambridge University in 2014, at Kosinski’s Psychometrics Centre.
Psychometrics, sometimes also called psychographics, focuses on measuring psychological traits, such as personality. In the 1980s, two teams of psychologists proved that every trait of a human being can be assessed based on five dimensions of personality, known as the “Big Five”. These are: openness (how open you are to new experiences?), conscientiousness (how much of a perfectionist are you?), extraversion (how sociable are you?), agreeableness (how considerate and cooperative you are?) and neuroticism (are you easily upset?). Based on these dimensions — they are also known as OCEAN an acronym for openness, conscientiousness, extroversion, agreeableness, neuroticism — we can make a relatively accurate assessment of the kind of person in front of us. This includes his or her needs and fears, and how he or she is likely to behave. The “Big Five” has become the standard technique of psychometrics. But for a long time, the problem with this approach was data collection, because it involved filling out a complicated, highly personal questionnaire. Then came the Internet. And Facebook. And Kosinski.
Michal Kosinski was a student in Warsaw when his life took a new direction in 2008. He was accepted by Cambridge University to do his PhD at the Psychometrics Centre, one of the oldest Institutions of this kind worldwide. Kosinski joined fellow student David Stillwell (now a lecturer at Judge Business School at the University of Cambridge) about a year after Stillwell had launched a little Facebook application in the days when the platform had not yet become the behemoth it is today. Their MyPersonality app enabled users to fill out different psychometric questionnaires, including a handful of psychological questions from the Big Five personality questionnaire (“I panic easily”– “I contradict others”). Based on the evaluation, users received a “personality profile” — individual Big Five values — and could opt-in to share their Facebook profile data with the researchers. Kosinski had expected a few dozen college friends to fill in the questionnaire, but before long, hundreds, thousands, then millions of people had revealed their innermost convictions. Suddenly, the two doctoral candidates owed the largest dataset combining psychometric scores with Facebook profiles ever to be collected.
The approach that Kosinski and his colleagues developed over the next few years was actually quite simple. First, they provided test subjects with a questionnaire in the form of an online quiz. From their responses, the psychologists calculated the personal Big Five values of respondents. Kosinski’s team then compared the results with all sorts of other online data from the subjects: what they “liked”, shared or posted on Facebook, or what gender, age, place of residence they specified, for example. This enabled the researchers to join the dots and make correlations. Remarkably reliable deductions could be drawn from simple online actions. For example, men who “liked” the cosmetics brand MAC were slightly more likely to be gay; one of the best indicators for heterosexuality was “liking” Wu-Tang Clan. Followers of Lady Gaga were most probably extroverts, while those who “liked” philosophy tended to be introverts. While each piece of such information is too weak to produce a reliable prediction, when tens, hundreds, or thousands of individual datapoints are combined, the resulting predicitions become really accurate.
Kosinski and his team tirelessly refined their models. In 2012, Kosinski proved that on the basis of an average of 68 Facebook “likes” by a user, it was possible to predict their skin colour (95% accuracy) their sexual orientation (88% accuracy), and their affiliation to the Democrat or Republican party (85%). But it didn’t stop there. Intelligence, religion, as well as alcohol, cigarette and drug use, could all be determined. From the data it was even possible to deduce whether deduce whether someone’s parents were divorced. The strength of a model was illustrated by how well it could predict a subject’s answers. Kosinski continued to work on the model incessantly: before long, his model was able to evaluate a person better than the average work colleague, merely on the basis of ten Facebook “likes”. Seventy “likes” were enough to outdo what a person’s friends knew, 150 what their parents knew, and 300 “likes” what their partner knew. More “likes” could even surpass what a person thought they knew about themselves. On the day that Kosinski published these findings, he received two phone calls. The threat of a lawsuit and a job offer. Both from Facebook.
Only weeks later Facebook “likes” became private by default. Before, the default setting was that anyone on the internet could see your Likes. But this was no obstacle to data collectors: while Kosinski always asked for the consent of Facebook users, many apps and online quizzes today require access to private data as a precondition for personality tests. (Anybody who wants to evaluate themselves based on their Facebook “likes” can do so on Kosinski’s website applymagicsauce.com, and then compare their results to those of a classic Ocean questionnaire: discovermyprofile.com/personality.html.)
But it was not just about “likes” or even Facebook: Kosinski and his team could now ascribe Big Five values based purely on how many profile pictures on Facebook or on how many contacts a person has (a good indicator of extraversion). But we also reveal something about ourselves when we’re offline. For example, the motion sensor on our phone reveals how quickly we move and how far we travel (this correlates with emotional instability). Our smartphone, Kosinski concluded, is a vast psychological questionnaire that we are constantly filling out, both consciously and unconsciously. Above all, however — and this is key — it also works in reverse: not only can psychological profiles be created from your data, but your data can also be used the other way round to search for specific profiles: all anxious fathers, all angry introverts, for example — or maybe even all undecided Democrats? Essentially, what Kosinski had invented was sort of a people search engine.
Kosinski started to recognise the potential — but also the inherent danger — of his work. To him, the Internet had always seemed like a gift from heaven. What he really wanted was to give something back, to share. Data can be copied, so why shouldn’t everyone benefit from it? It was the spirit of a whole generation, the beginning of a new era that transcended the limitations of the physical world. But what would happen, wondered Kosinski, if someone abused his people search engine to manipulate people? He began to add warnings to most of his scientific work. His approach, he warned, “could pose a threat to an individual’s well-being, freedom, or even life”. But no one seemed to grasp what he meant.
Around this time, in early 2014, Kosinski was approached by a young assistant professor called Aleksandr Kogan. He was inquiring on behalf of a company that was interested in Kosinski’s method. The company wanted to access the myPersonality database, Kosinski remembers. Kogan wasn’t at liberty to reveal for what purpose; he was bound to secrecy. At first, Kosinski and his team considered this offer, as it would mean a great deal of money for the institute — but then he hesitated. Finally, Kosinski remembers, Kogan came out with the name of the company: SCL — Strategic Communication Laboratories. Kosinski Googled the company: “[We are] the premier election management agency”, it says on the company’s website. SCL provides marketing based on psychological modelling. One of its core focuses: Influencing elections. Influencing elections? Perturbed, Kosinski clicked through the pages. What kind of company was this? And what were these people planning?
What Kosinski did not know at the time: SCL is the front for a group of companies. Who exactly owns SCL and its diverse branches is unclear, thanks to convoluted corporate structures — as can be seen in the UK Companies House, the Panama Papers and the Delaware company registry . Some of the SCL offshoots have been involved in overthrowing governments in developing countries, whereas others have developed methods for psychologically manipulating Afghan citizens for NATO. And meanwhile, SCL is also the parent company of Cambridge Analytica, that ominous Big Data outfit that later worked for Trump’s online campaign and Brexit.
Kosinski knew nothing about all this, but he had a bad feeling. “The whole thing started to stink,” he recalls. On further investigation, he discovered that Aleksandr Kogan had secretly registered a company doing business with SCL. As revealed later by the Guardian in December 2015, and from documents seen by Das Magazin, it emerges that SCL found out about Kosinski’s method from Kogan. Suddenly it dawned on Kosinski that they might have reproduced (or copied?) the Facebook Likes-based Big Five measurement tool in order to sell it to this election-influencing company. He immediately broke off contact with him and informed the director of the institute, sparking off a complicated conflict within the university. The institute was worried about its reputation. Aleksandr Kogan then moved to Singapore, married, and changed his name to Dr. Spectre. Michal Kosinski finished his PhD, got a job offer from Stanford and moved to the USA.
All was quiet for about a year. Then, in November 2015, the more radical of the two Brexit campaigns, “Leave.EU”, supported by Nigel Farage, announced that it had commissioned a Big Data company to support its online campaign: Cambridge Analytica. The company’s core strength: innovative political marketing — microtargeting — by measuring people’s personality from their digital footprints, based on the OCEAN model.
Now Kosinski received emails asking what he had to do with it — the words Cambridge, personality, and analytics immediately made many people think of Kosinski. It was the first time he had heard of the company. Horrified, he looked at the website. Was his methodology being used on a grand scale for political purposes?
After the Brexit result, friends and acquaintances wrote to him: Just look at what you’ve done. Everywhere he went, Kosinski had to explain that he had nothing to do with this company.
Months passed. 19 September 2016 comes around; the U.S. elections are fast approaching. Guitar riffs fill the dark-blue hall of the New York Grand Hyatt hotel; Creedence Clearwater Revival’s “Bad Moon Rising”. The Concordia Summit is a kind of World Economic Forum in miniature. Decision-makers from all over the world have been invited, among them Swiss President Schneider- Ammann. “Please welcome to the stage Alexander Nix, Chief Executive Officer of Cambridge Analytica,” a smooth female voice announces. A slim man in a dark suit walks onto the stage. A hush falls. (The video is on Youtube). Many people present know that this is Trump’s new digital strategy man. “Soon you’ll be calling me Mr. Brexit,” Trump had tweeted somewhat cryptically a few weeks earlier. Political observers had indeed noticed some striking similarities between Trump’s agenda and that of the right-wing Brexit movement. But few had noticed the connection with Trump’s recent hiring of a marketing company named Cambridge Analytica.
Up to this point, Trump’s digital campaign had consisted of more or less one person: Brad Parscale, a marketing entrepreneur and failed start-up founder who created a rudimentary website for Trump for 1,500 dollars. The 70-year-old Trump is not digitally savvy — there isn’t even a computer on his office desk. Trump doesn’t do emails, his personal assistant once revealed. She herself talked him into having a smartphone — from which he now tweets incessantly.
Hillary Clinton, on the other hand, relied heavily on the legacy of the first “social-media President”, Barack Obama. She had the address lists of the Democratic Party, worked with cutting-edge big data analysts from “BlueLabs” and received support from Google and DreamWorks. When it was announced in June 2016 that Trump had hired Cambridge Analytica, the establishment in Washington just turned up their noses. Foreign dudes in tailor-made suits who don’t understand the country and its people? Seriously?
“It is my privilege to speak to you today about the power of Big Data and psychographics in the electoral process.” The logo of Cambridge Analytica — a brain composed of network nodes, like a map, appears behind Alexander Nix. “Only eighteen months ago, Senator Cruz was one of the less popular candidates,” explains the blonde man in a cut-glass British accent, which puts Americans on edge the same way that a standard German accent can unsettle Swiss people. “Less than 40 per cent of the population had heard of him”, another slide says. At the end of 2014, Cambridge Analytica had become involved in the U.S. election campaign, initially as a consultant for Republican Ted Cruz, funded by the secretive U.S. software billionaire Robert Mercer. Everyone in the room knows about the meteoric rise of the conservative Senator Cruz. It was one of the strangest events of the election campaign: How had Senator Cruz become the last serious challenger to Trump in the Republican primaries, rising from 5 to 35 percent? “So how did he do this?” Up to now, explains Nix, election campaigns have been organised based on demographic concepts. “A really ridiculous idea. The idea that all women should receive the same message because of their gender — or all African Americans because of their race.” What Nix means is that other campaigners so far have relied on demographic whereas Cambridge Analytica is using psychometrics.
Though this might be true, Cambridge Analytica’s role within Cruz’s campaign isn’t undisputed. In December 2015 the Cruz team credited their rising success to psychological use of data and analytics. In “Advertising Age” a political client called the embedded Cambridge staff “like an extra wheel”, but found their the core product, Cambridge’s voter data modeling, still “excellent”. Similarly it remains unclear how deeply CA was involved in the “Leave”-campaign. Cambridge Analytica will not discuss such questions.
Nix clicks to the next slide: five different faces, each face corresponding to a personality profile. It is the Big Five or OCEAN Model. “At Cambridge,” says Nix, “we were able to form a model to predict the personality of every single adult in the United States of America.” The hall is now captivated. According to Nix, the success of Cambridge Analytica’s marketing is based on a combination of three elements: behavioural science using the OCEAN Model, Big Data analysis, and ad targeting. Ad targeting is personalised advertising, in other words, advertising that is aligned as accurately as possible to the personality of an individual consumer.
Nix candidly explains how his company does this. First Cambridge Analytica buys personal data from a range of different sources like e.g. land registries, automotive data, shopping data, bonus cards, club memberships, what magazines you read, what churches you attend. Nix displays the logos of globally active data brokers like Acxiom and Experian — in the U.S., almost all personal data is for sale. For example, if you want to know where Jewish women live, you can simply buy this information, phone numbers included. Now Cambridge Analytica aggregates this data with the electoral rolls of the Republican party and online data such as Facebook “likes” — today the company claims to not have used Facebook data — and calculates a Big Five personality profile. Digital footprints suddenly become real people with fears, needs, interests — and residential addresses.
The methodology looks quite similar to the models that Michal Kosinski once developed. Cambridge Analytica also uses according to Nix “surveys on Social Media” and Facebook data. And Cambridge Analytica does exactly what Kosinski warned of: “We have profiled the personality of every adult in the United States of America — 220 million people,” Nix boasts. He opens the screenshot. “This is a data dashboard that we prepared for the Cruz Campaign.” A digital control center appears. On the left are diagrams; on the right, a map of Iowa, where Cruz won a surprisingly large number of votes in the primary. And on the map, there are hundreds of thousands of small red and blue dots. Nix narrows down the criteria: “Republicans” — the blue dots disappear; “not yet convinced” — more dots disappear; “male”, and so on. Finally, only one name remains, including age, address, interests, personality and political inclination. How does Cambridge Analytica now target this person with an appropriate political message?
Nix shows how psychographically categorised voters can be differently addressed, based on the example of gun rights, the 2nd Amendment: “For a highly neurotic and conscientious audience the threat of a burglary — and the insurance policy of a gun.” An image on the left shows the hand of an intruder smashing a window. The right side shows a man and a child standing in a field at sunset, both holding guns, clearly shooting ducks: “Conversely, for a closed and agreeable audience. People who care about tradition, and habits, and family.”
How to keep away Clinton voters from the ballot box
Trump’s striking inconsistencies, his much-criticized fickleness, and the resulting array of contradictory messages, suddenly turned out to be his great asset: a different message for every voter. That Trump acted like a perfectly opportunistic algorithm purely following audience reactions, is something the mathematician Cathy O’Neil remarked already in August 2016. “Pretty much every message that Trump put out was data-driven”, Alexander Nix remembers. On the day of the third presidential debate between Trump and Clinton, Trump’s team tested 175,000 different ad variations for his arguments, in order to find the right versions above all via Facebook. The messages differed for the most part only in microscopic details, in order to target the recipients in the optimal psychological way: different headings, colours, captions, with a photo or video. This fine-tuning reaches all the way down to the smallest groups, Nix explains in an interview with Das Magazin. “We can address villages or apartment blocks in a targeted way. Even individuals.”
In the Miami district of Little Haiti, Trump’s campaign provided inhabitants with news about the failure of the Clinton Foundation following the earthquake in Haiti — in order to stop them voting for Hillary Clinton. This was one of the goals: to keep potential Clinton voters (which include wavering left-wingers, African-Americans and young women) away from the ballot box, to “suppress” their vote, as one Trump employee puts it. These “dark posts” — sponsored news-feed- style ads in Facebook timelines that can only be seen by users with specific profiles — included videos aimed at Afro-Americans in which Hillary Clinton refers to black men as predators, for example.
Nix finishes his lecture at the Concordia Summit by stating that blanket advertising is dead. “My children will certainly never, ever understand this concept of mass communication.” And before leaving the stage, he announces that one of the remaining presidential candidates is using this new technology.
Just how precisely the American population was being targeted by Trump’s digital troops at that moment was not visible — because they attacked less on mainstream TV, and more with personalised messages on social media or digital TV. And while the Clinton team thought it was in the lead, based on demographic projections, Bloomberg journalist Sasha Issenberg was surprised to note on a visit to San Antonio — where Trump’s digital campaign is based — that a “second headquarters” was being created. The embedded Cambridge Analytica team, apparently only a dozen people, received 100,000 dollars from Trump in July, 250,000 in August, and five million in September. According to our conversation with Mr. Nix, it earned over 15 million dollars overall.
And the measures were radical: From July 2016, Trump’ canvassers were provided with an app, with which they could identify the political views and personality types of the inhabitants of a house. It was the same app provider used during Brexit. Trump’s people only rang at the doors of houses that the app rated as receptive to his messages. The canvassers came prepared with guidelines for conversations tailored to the personality type of the resident. In turn, the canvassers fed the reactions into the app — and the new data flowed back to the dashboards of the Trump Campaign.
Again, this is nothing new. The Clinton team did similar things — but as far as we know they did not use psychometrical profiling. CA however divided the U.S. population into 32 personality types, and focused on just 17 states. And just as Kosinski had established that men who like MAC cosmetics are slightly more likely to be gay, Cambridge Analytica discovered that a preference for cars made in the US was a great indication of a potential Trump voter. Among other things, these findings now showed Trump which messages worked best and where. The decision to focus on Michigan and Wisconsin in the final weeks of the campaign was made on the basis of data analysis. The candidate became the instrument for implementing a model.
What is Cambridge Analytica doing in Europe?
But to what extent did psychometric methods influence the outcome of the election? When asked, Cambridge Analytica is unwilling to provide any proof of the effectiveness of its campaign. And it is quite possible that the question, how important psychometrical targeting in the outcome of the 2016 election was, is impossible to answer. And yet there are clues: There is the fact of the surprising rise of Ted Cruz during the primaries. Also there was an increased number of voters in rural areas. There was the decline in the number of African-American early votes. The fact that Trump spent so little money may also be explained by the effectiveness of personality-based advertising. As does the fact that he invested far more in digital than TV campaigning compared to Hillary Clinton. Facebook proved to be the ultimate weapon and the best election campaigner, as tweets of several Trump employees show.
Many voices have claimed that the statisticians lost the election because their predictions were so off the mark. But what if the opposite is true: statisticians in fact helped win the election — but only those who were using the new method? It is an irony of history that Trump often grumbled about scientific research, but used a highly scientific approach in his campaign.
Another big winner is Cambridge Analytica. Its widely reported board member Steve Bannon, former executive chair of the right-wing online newspaper Breitbart News, has been appointed as Donald Trump’s Senior Counselor and chief strategist. Marion Maréchal-Le Pen, the aspiring Front- National activist and niece of France’s presidential candidate, already tweeted that she would accept his invitation to collaborate with him , and in an internal company video of Cambridge Analytica, a recording of a meeting is entitled “Italy”. Already in 2012 SCL Elections has been active in Italian politics. Whilst Cambridge Analytica is not willing to comment alleged ongoing talks with UK Prime Minister Theresa May, Alexander Nix claims that he is building up his client base worldwide, and that he has received inquiries from Switzerland and Germany.
Kosinski has observed all of this from his office at Stanford. Following the U.S. election, the university is in turmoil. Kosinski is responding to developments with the sharpest weapon available to a researcher: a scientific analysis. Together with his research colleague Sandra Matz, he has conducted a series of tests, which will soon be published. The initial results, which have been seen by Das Magazin, are alarming: The study shows the effectiveness of personality targeting by showing that marketers can attract up to 63% more clicks and up to 1,400% more conversions in real-life advertising campaigns on Facebook when matching products and marketing messages to consumers’ personality characteristics. They further demonstrate the scalability of personality targeting by showing that the majority of Facebook Pages promoting products or brands are affected by personality and that large numbers of consumers can be accurately targeted based on a single Facebook Page.
The world has been turned upside down. Great Britain is leaving the EU, Donald Trump is President-elect of the United States of America. And in Stanford the Polish researcher Michal Kosinski, who wanted to warn against the danger of using psychological targeting in a political setting, is once again receiving accusatory emails. “No,” says Kosinski quietly and shaking his head, “this is not my fault. I did not build the bomb. I only showed that it exists.”
* * *
After the publication of the German version of this article a Cambridge Analytica spokesman gave the following statement:
Cambridge Analytica does not use data from Facebook.
It has had no dealings with Dr Michal Kosinski.
Cambridge Analytica did not engage in efforts to discourage any Americans from casting their vote in the presidential election. Its efforts were solely directed towards increasing the number of voters in the election.
Cambridge Analytica did not engage in efforts to discourage any Americans from casting their vote in the presidential election. Its efforts were solely directed towards increasing the number of voters in the election.