Big Flaws in Big Data Profiling

MIT’s Catherine Tucker explains how data deserts, inaccuracy, and inequality skew digital profiles

Credit: RB, Sammy Blindell

By Paula Klein

Much has been said about the pervasive use –and misuse — of digital consumer profiles by online marketers. While profiles offer shortcuts and efficiency, they raise privacy concerns by choosing products for us, recommending medical treatments, and impacting our credit scores and financial standing.

But privacy isn’t the only concern that’s surfacing about data profiles.

Catherine Tucker, MIT Sloan Distinguished Professor of Management and a Professor of Marketing, has studied online economic trends for many years. She offered some provocative new perspectives at a recent MIT IDE seminar on Data Deserts, Data Accuracy and Inequality. Not only does her research challenge the accuracy and effectiveness of some digital profile methodologies, it sheds light on related social issues that must become central to the conversation.

In particular, not everyone is treated equally in the big data universe.

As a result, Tucker asks: Do personal data profiles perpetuate economic inequality and widen the digital divide? What segments of the population are excluded when profiles are created? What are the effects of these practices?

A 2019 research paper How Effective Is Third-Party Consumer Profiling and Audience Delivery?: Evidence from Field Studies that Tucker co-authored [with Nico Neumann, and Timothy Whitfield], explains how data brokers often use online browsing records to create digital consumer profiles that they sell to marketers as pre-defined audiences for ad targeting. However, the paper states, “this process is a `black box’: Little is known about the reliability of the digital profiles that are created or of the audience identification provided by buying platforms.”

The authors investigate the practice using three field tests measuring the accuracy of more than 90 third-party audiences across 19 data brokers.

Results show that audience segments vary greatly in quality and often yield inaccurate or unreliable data.

In comparison to random audience selection, the use of black-box data profiles, on average, increased identification of a user with a desired attribute by 0–77%.

The study showed that digital profiling is often only able to accurately identify a male consumer around half the time, and that accuracy of both digital profiling and audience delivery vary by provider and user characteristic ‐- for example, the demographic characteristics of younger people or those who live in smaller households are easier to predict.

Poor Cost Benefits

The findings suggest that third‐party digital profiles currently result in a poor cost‐benefit ratio for advertisers. Given the high extra costs of these targeting solutions and their relative inaccuracy, the researchers found “that third-party audiences are often economically unattractive, except for higher-priced media placements.”

Tucker told IDE attendees that demographic data on age and gender is most coveted, but also most likely to be inaccurate. “I tell students not to use demographic data because that’s an outdated metric, yet that’s what brokers are buying.” Better targeting criteria are needed, she said.

To probe the topic of online marketing profiles further, Tucker and her research team have revisited the issue to determine why the data was often inaccurate and what could be improved. The new study found no correlation between the price of the data and its accuracy, nor did the size or location of the broker make a difference.

Data analysis did show that the biggest factor affecting accuracy was the socio-economics of the data pool.

For example, when they examined audience traits (cookie characteristics) like high levels of wealth, education, and home ownership, they found more accurate data and better ad targeting. Conversely, there is less data — and therefore, less accurate data — on poorer, less educated households.

Tucker said this raises questions about which consumers advertisers and data brokers seek out. In fact, people who frequently search and buy more online have more robust and stable digital footprints, while “poorer households have more fragmented digital identities.” More stable digital identities are easier to track and their purchases are easier to predict — and that perpetuates and widens the digital divide, she suggested.

Bigger Issues to Tackle

Based on the findings that black box profiling falls short, Tucker asked,

“Why, in 2022, are we studying the accuracy of age and gender data? There are larger issues to study. Those narrow, binary definitions are outdated and need to be reconsidered.”

Similarly, “why are we so focused on privacy debates around data collection and marketing communications instead of discussing the contexts where privacy really matters… where there are large potential economic policy consequences” in areas such as healthcare, financial lending, and education?

Moreover, privacy debates in the U.S. and the EU are myopic, according to Tucker, and data privacy is really a conversation about privilege by the privileged. “For low-income families, there is a data desert problem,” she said. Data exclusion and the existence of data deserts need to be part of the big data conversation going forward.

To begin to find solutions, today’s data marketing assumptions need to be challenged, Tucker said. That may also mean revamping search algorithms and rethinking the value of data to level the data profiling playing field to benefit consumers as well as advertisers.




The IDE explores how people and businesses work, interact, and prosper in an era of profound digital transformation. We are leading the discussion on the digital economy.

Recommended from Medium

Strategy In Session S01 P06 — Data Driven Industry World, A Closer Look Into Why Data Matters!

K-Nearest Neighbor Deconstructed

Life Expectancy and GDP

A journey into our repair data

Job hunting is always a hassle. It’s a brutal game, where you need to stand

So you want to call yourself a data scientist?

Multi-class Classification on Imbalanced Data using Random Forest Algorithm in Spark

The Joy of Mixing Data & Design

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
MIT IDE Paula Klein, Editor

MIT IDE Paula Klein, Editor

Addressing one of the most critical issues of our time: the impact of digital technology on businesses, the economy, and society.

More from Medium

Chasing RoAS: Is marketing attribution a pipe dream?

The Most Comprehensive Guide To Digital Transformation

France, Art, and Marketing

A review of recent Medium posts on competitive intelligence