Cambridge Analytica / Facebook, privacy and the future of data-driven marketing
Or how AI can prevent personal data misuse in the Big Data era
Everybody has something to say about Cambridge Analytica / Facebook scandal with personal data misuse. And we at Demografy are not exception because privacy-enabled consumer data marketing (this is not oxymoron) is our primary business focus.
There is the ongoing debate about balance between privacy and better user experience, efficiency, relevant ads, etc achieved via usage of private data. We think there are already technology solutions that could provide a much better balance that we currently have. So we’d like to share our opinion on CA / FB scandal as well as our vision of the future of data-driven marketing and how to protect personal data from unsolicited use and identity theft using AI.
- Cambridge Analytica / Facebook case is unfortunately not unique. Far from unique.
- The whole industry of today’s data marketing and the concept of circulating large amounts of non opt-in personal information in huge number of copies between many unregulated companies is increasingly insane and risky.
- Though we still need data marketing as integral part of our economy and life. But we also should solve its increasingly dangerous and critical problems. Especially when we’re becoming existentially reliant on data security.
- However, instead of using government over regulation, we need to eliminate these problems technologically and this is already possible at least partially (though some legal changes would make sense too)
- Demografy proposes machine learning based solution that is suitable for both businesses and consumers. It enables businesses to get actionable data while preventing personal data sharing, selling and theft.
Data-driven marketing is omnipresent
The era of traditional marketing and decision making is gone. Today’s marketing and decision making process in organizations became increasingly data-driven. Meaning that key decisions and activities are based and backed by collected data and its interpretation. All major corporations and many smaller businesses now deploy data-driven analytics, marketing and other solutions in their processes.
Demographic and behavioral consumer data is one of the most crucial pieces of big data in modern world. Knowing this information empowers businesses and organizations in many ways. Many everyday technologies and marketing methods rely on this data: ad targeting, customer and prospect segmentation, customer analytics and many more.
Cambridge Analytica / Facebook case
However current scandal with CA and Facebook is example of how things can go wrong. First of all, what happened?
- Between 2007 and 2014 Facebook policies allowed collection of private information about its users by third-party apps via Facebook API. Facebook claims that such information might be collected only after Facebook user authorizes such third-party app. However by that time information of user’s friends might also be collected without their consent. Though Facebook stated in its policies that friends’ information can only be used for better user experience in app and prohibited selling or using such information for advertising.
- And something bad did happen. Cambridge University academic Aleksandr Kogan built app called thisisyourdigitallife (separately from his work at Cambridge University). His company Global Science Research in collaboration with Cambridge Analytica paid hundreds of thousands of users to take a personality test and agree to have their data collected for academic use. However the app also collected private information of the test-takers’ Facebook friends. As a result private information of tens of millions of people was collected.
- Later CA used harvested data in its marketing and political business activities. Such activities included controversial Trump’s presidential campaign, local US elections, Brexit referendum as well as many other election campaigns globally.
Probably Facebook data was just one of the sources of consumer data for CA. And it was harvested in violation of Facebook terms of service. However Facebook made this possible and Facebook did contribute to CA and its clients significantly. And Facebook didn’t take adequate measures in response to violation of their APIs (they were aware by the time of the breach). Facebook didn’t react probably in the fear of PR backslash though they got it anyway.
But is Cambridge Analytica data misuse really a unique case?
The reason this case got such huge media coverage is because of the CA involvement in controversial 2016 US presidential elections and political activities in general. If they limited data usage only to marketing purposes without involving in politics probably there would be no news.
Are their practices and methods novel? No. Their use of third-party private data called database marketing or data marketing or data append services or data brokerage. Or simply consumer databases. Hundreds or thousands of companies use these methods to sell private data. They have huge databases containing information about tens and even hundreds of millions of consumers. This information contains demographic and behavioral profiles of people as well as sensitive information like addresses, emails, phones, relatives, credit history and many more.
Their history also is much older than Internet. For example, history of Acxiom, market leader in consumer data marketing, may be traced to as far as 1969.
Where do they get personal information from?
- Simply buy larger part of their data from other data brokers. You will never know where this data originally comes from.
- Use illicit or controversial methods like in CA / Facebook case. In some cases their methods were even pretty legal and acceptable by the time of collecting data because Internet was a Wild West during its inception and there was no data regulation at all. This allowed many data companies to arise.
One of the best examples is RapLeaf, data marketing company, that was active during 2005–2013. It was involved in controversy with collecting personally identifiable information like email, Facebook and Myspace IDs of people and selling this data to third-party companies. Methods of collecting this information included such practices as scraping data on Facebook. Ultimately, different parts of this company was subsequently acquired during 2011–2013 by Acxiom and TowerData, two currently functioning established database marketing companies. This is the story about where personal data of current data market leaders comes from.
Database marketing and its controversial practices is today’s backbone of advertising, marketing and now even politics. Even tech giants like Facebook use plain old database marketing. For example, Facebook buys additional personal data of its users from its data partners to improve their ad targeting. As we see this data may end up and come from everywhere. Anecdotally, Facebook can even buy its own data multiple times if, for example, one of its data partners bought data from another data marketing company which in turn bought it from Cambridge Analytics or RapLeaf (or their current owners) or any other company caught (or not yet caught) in controversy.
Unsolicited use of private data, sensitive information sharing and risks of data breaches
Cases like Cambridge Analytica or RapLeaf are just the tip of the iceberg. The problem is that current consumer data market consists of completely or almost completely third-party non opt-in circulating data which is operated by many unregulated companies. This personal information of people is either harvested illicitly or collected from unknown sources. Either way you can hardly find a person who is happy that his personal data is used by marketers or politicians without his or her consent.
Some companies try to seem more privacy friendly by allowing opt-out from their databases. But problem is that that such opt-out assumes that their non opt-in selling and reselling of person’s private data is completely legit and fine (though this can be legit, this is definitely not fine). Additionally, this is simply impossible to find all databases you are in and opt-out from all of them. Your data is already shared in large number of copies. And what is the most ridiculous, many data marketing companies allows you to opt-out only for money. Meaning you are required to pay opt-out fee in order to be removed from their databases. So this doesn’t work at all and probably is not intended to work.
The key problems of current industry are:
- Unsolicited use of non opt-in personal data collected by illicit or unknown means without consumer consent.
- Increasing sharing and disclosure of sensitive information which leads to accumulating risk of data misuse and identity theft. Today’s data marketing services that enables businesses to get more data about their customers or prospects work by exchanging this data for personally identifiable information. It means that if company A wants income level and gender of its customer it should provide company B with address, phone, email or other sensitive data of customer in order for B to obtain record of this customer in their database. BTW, in 80–90% of cases they don’t find the record but the sensitive information is disclosed.
- Risk of huge data breaches. The fact that so large amount of private data circulates in large number of copies among large number of operators results in increasing probability of data theft. And this happens from time to time. For example, largest data breach in history when in 2003 more than 1.6 billion customer records were stolen during the transmission of data to and from Acxiom’s clients.
Future of data-driven marketing or how AI can prevent personal data misuse
Increasing data regulation and growing privacy concerns
Data marketing landscape is changing. Personal data regulation is increasing, privacy concerns are growing. EU introduced its General Data Protection Regulation (GDPR) which becomes enforceable from 25 May 2018. According to GDPR all data processing and use should be opt-in, and consumer consent for data use should be clear. In general it completely prohibits current data marketing based on third-party non opt-in personal data. At least within EU and for companies that use personal data of EU residents.
While in US government regulation is less strict, cases like CA / Facebook remind us that actions should be taken and industry needs changes. Such cases also accelerate changes in industry which are inevitable. At the same time, government over regulation is not the best solution. Besides safe guarding privacy it has many protectionism elements which are bad for businesses, competition and development in general. It also would put at risks many companies and jobs. And in some countries data regulation is just a pretext for censorship. Though some moderate legal changes would make sense.
We need data marketing but we also need new technology platforms for it
While data-driven marketing has obvious and serious problems we should accept the fact that we need it. We need all these technologies and business models that rely on consumer data. We need ads, we need competition fueled by marketing, we need personalized offers, we need jobs it creates, we need many and many companies that rely on it. This is just how our economy functions right now. We can’t cripple or slow down some parts of economy without consequences.
Solution is technology
We need data marketing. But we don’t need increasingly dangerous consequences it causes which now include influencing political opinions and causing far reaching changes in governments. So what is the solution? In the technological era solution to any problem is technology. Data marketing needs technological shift. It no longer can rely on current technologies that allow spreading of personal data and require sharing of personal data on large and uncontrolled scale.
When we built Demografy we had and have vision of the future of data-driven marketing. And our goal is to shape this future. We believe that interests and concerns of both businesses and consumers should be met. It means that ideal solution enables businesses to get consumer insights and segment their audiences while keeping consumer privacy safe at all levels.
There are three types of data:
- First-party data. The data business directly receives from its customers
- Second-party data. The data which is simply someone else’s first-party data. For example, one business purchases first-party data from another business
- Third-party data. Any purchased data without known origin.
Vast majority of circulating consumer data is third-party. Our mission is to reduce at least demographic consumer data to only first-party data. Meaning that private information of customers should not go outside of business they have direct relations with and which they granted to use their data. How we achieve this?
Demografy uses machine learning to provide demographic data about consumers without requiring their personally identifiable information in exchange. It means sensitive information always stays first-party. Businesses can unlock their data but keep private information in-house.
Demografy technology requires only first and last names. It can even process masked last names like masked credit card numbers.
e.g. Michael J*son
We believe this is the time to start processing person’s private data with the same restrictions and caution we process credit card data.
This enables to completely mask identities of people and protect their sensitive information while making ads, more relevant offers and data insights possible. This also encourages agencies and consumer data processing companies request only non personal identifiable information from their clients to provide them with demographic data.