How to analyse data while protecting privacy

Nick Halstead
DataScan
Published in
4 min readFeb 26, 2018

Ads that don’t overstep. Attitudes to sharing personal data. Marketers are integrating data across their tech stacks. Why decentralisation matters.

All included in this week’s digest on the world of data. 👇🏼

How to analyse data while protecting privacy. Stuart Langridge explains how companies can use aggregated data to be data-driven while maintaining customer privacy.

Most people are “basically OK” with the use of their data, they “just don’t want to be victimised by it”. Using aggregated data still enables you to “draw conclusions about the general trends and the general distribution of your user base”, without targeting customers on an individual level. Furthermore:

Of course, this applies to any data you want to collect. Do you want analytics on how often your users open your app? What times of day they do that? Which OS version they’re on? How long do they spend using it? All your data still works in aggregate, but the things you’re collecting aren’t so personally invasive, because you don’t know if a user’s records are lies.

This needs careful thought — there has been plenty of research on deanonymising data and similar things, and the EFF’s Panopticlick project shows how a combination of data can be cross-referenced and that needs protecting against too, but that’s what data science is for; to tune the parameters used here so that individual privacy isn’t compromised while aggregate properties are preserved.

Ads that don’t overstep. A recent Harvard Business Review experiment found that consumers find personalised ads “creepy”, but still want meaningful interactions.

Similarly, a recent Accenture study found that “58 percent of consumers would switch half or more of their spending to a provider that excels at personalising experiences without compromising trust”. The study, on how to achieve hyper-relevance, explains that companies need to rethink how they approach data analytics:

Hyper-relevant companies don’t rely solely on descriptive analytics or traditional sources of information. They invest in predictive analytics, collaborate with an ecosystem of stakeholders to capture real-time snapshots of every consumer, and mine data in new ways to understand the customer journey that extends beyond core products and services and across channels.

In addition, hyper-relevant companies redouble their data security efforts. They ensure customers have full control of their data across touch points. They eliminate duplicate requests for customer information and permissions. And they make sure all customer data is secure and visible to employees on a need-to-know basis.

ODI survey reveals attitudes to sharing personal data. Research undertaken by the Open Data Institute and YouGov shows the generational differences between British adults and their approaches to data sharing.

Crucially, “young adults were generally more comfortable sharing information” in comparison to their parents’s generation. Furthermore:

  • Trusting and knowing organisations increase the likelihood consumers will share personal data about them
  • Healthcare organisations are most trusted
  • Data skills need to be improved
  • Consumers are prepared to make worthy trade-offs to share data about them if it benefits themselves and others in society

Marketers are integrating data across their tech stacks. Ross Benes reports on the increase of marketers integrating multiple data points. Benes notes the difficulty in combining the multiple systems, as it requires “a fair amount of manual labour”. Data from varied sources is held in different formats, so traditionally needs to be reformatting and integrated by a data science team:

“Data doesn’t come in a neat little consistent package,” said Adam Kleinberg, CEO of ad agency Traction. “It comes in all shapes and sizes and from all kinds of places. Getting that to all work together seamlessly is frequently labor intensive and sometimes impossible.”

Benes highlights another key challenge is finding “accurate data at scale”. Using external data is an incredibly valuable source for insightful information, but it is difficult to find reliable, accurate and applicable data:

To increase the scope of their data pools, brands often lean on third-party data, which helps them reach more people. But the reach of third-party data often comes at the expense of accuracy. About 80% of advertisers in a recent Digiday Research survey said third-party data is unreliable.

Since third-party providers don’t have direct relationships with users, they make inferences to build data sets. But these inferences don’t always align across vendors. A ChoiceStream study found that third-party data companies disagreed about 30% of the time on an individual’s gender. About 40% of respondents in Ascend2’s survey said that getting reliable data is a real challenge.

→ Mark Burton, Head of Product Activation at Starcount, explains why companies should be using data from the multiple available sources to “truly understand their customers”, rather than relying on demographic factors. Will GDPR harm customer data platforms and marketers using them?

Miscellaneous

Why decentralisation matters. 👏🏼

Validating leaked passwords with k-anonymity. 🔒

Stripe Atlas: Software as a Service, as a business. 💯

Estonia & Finland connect data exchange layers. 🌍

Why self-taught AI has trouble with the real world. 🤖

The quantum computing apocalypse is imminent. 🚀

Characterising social media messages by how they propagate. ✅

FiveThirtyEight are sharing the data behind some of their articles. 👍🏼

Data viz tool explains privacy policies for you. 😅

A day in the life of Americans. ❤️

Datawrapper now automate colourblind checking 📈:

Stay up to date by joining my weekly data digest. 🚀

--

--