We’re running out of visual metaphors to conceptualize the amount of data that technology companies hold about us. Amazon is shipping our data in 18-wheelers, Google has built 15 small villages around the world to store our information, and Facebook just granted scholars to 52 tons of pepperoni pizzas worth of data on misinformation. Zettabyte, a word that should be reserved for a cyberpunk villain, is now entering our lexicon. The numbers quite literally transcend human comprehension.

While big tech is seen as the data-hoarding bogeyman, advertisers of all sizes have chased data with reckless abandon. In their lust, too many marketers have lost sight of which data points truly hold value and are hoarding information for information’s sake. This has serious implications for both our privacy and the fundamental quality of advertising we receive.

In recent years, the advertising industry has sextupled down on finding out who consumers are on a personal level. This is powered largely by a plethora of companies who fundamentally exist to collect any morsel of data that users drop into the ether. Perhaps no example is more comical than MoviePass, a company that is essentially a giant subsidy given to consumers to mine personal information. That information costs so much to obtain that MoviePass ran out of money last month. The “value” of the data obtained is still entirely unclear.

As you read this story in Chrome with a Gmail, Facebook, Instagram, and Google Maps tab open, see Dylan Curran’s incredible expose of the creepy amount of data Google and Facebook store on us. But also ask yourself, how much of this is actually useful to providing a better service? Put another way, is this creepy amalgamation of your digital activity in any way more helpful to a travel advertiser than a search for “best hotels in Maui?”

Most of the conversations in a post-GDPR/Cambridge Analytica world have started with a basic assumption. While marketers mining even our most asinine data is creepy, this data delivers value to pitchmen, and theoretically to those exposed to their ads. But there’s increasing evidence quietly piling up to the contrary.

According to an impeccably thorough report by Nico Neumann at Melbourne Business School, the data that powers the bulk of programmatic ad spend can only identify if a user is male or female about 50 percent of the time. In the eternal quest to figure out which “half of your ad budget is wasted,” you may want to start here. Neumann’s team estimates erroneous data costs advertisers’ $7B annually.

Advertising is effectively supposed to be a tax that we all pay to enjoy free services that have no business being free.

Due mostly to Facebook, skepticism over the quality of advertising data has faded from the limelight as public discourse focuses exclusively on privacy. The Cambridge Analytica scandal has essentially created conventional wisdom amongst the public that advertisers have disturbingly accurate data on us that they wield with impunity. This reading is incomplete and overestimates the data side of the Trump-Facebook saga.

The Trump campaign’s maestro manipulation of Facebook runs far deeper than profiling and hyper-targeting. Trump’s digital director Brad Pascale ironically took the Obama 2008 playbook of meticulously optimizing creative (devised by Optimizely founder Dan Siroker) and applied 2018 machine learning alongside a blatant disregard for ethics. In total, the Trump campaign tested 5.9 million versions of ads in a single month to find the perfect variation that resonated with their audience. Testing and treachery — as much as targeting — is the story of the Trump campaign’s sinister success on Facebook.

Finally, for all of the profiles on Cambridge Analytica, their insights hardly seem shocking to anyone with a rudimentary understanding of U.S. political science. Speaking on CBS after election day, Cambridge Analytica product director Matt Oczkowski said that the average Trump Supporter was “a bit older, a bit more male, a bit more white than the traditional Republican. A bit more rural.” I would have never guessed.

But remember, ethics aside, Cambridge Analytica was the standard of quality in their industry. As recently as March 2017, the company was giddily putting out press releases citing prestigious industry recognition. Many of the larger data brokers pale in comparison.

In 2014, Oracle paid roughly $400M for BlueKai, a platform that pegs me as a married homeowner with two children who is interested in subcompact cars, rap and hip-hop, hunting and golf. I’m single, rent a Brooklyn apartment and have never owned a car or shot a deer. And I hate golf.

Please don’t tell my fiance that I’m simultaneously already married and not married. Source: Oracle Data Cloud Registry page

At Narrativ, I had most of our employees check their digital identities. Our (male) VP of Product is a Spanish-speaking female in her 80s. Our Chief Technology officer is apparently a pre-teen student in his 80s making $20–29K per year. If you’re hopelessly perplexed, that’s exactly the point.

After a cruelly ironic registration process that forced me to fork over my personal data in order to access my personal data, Acxiom’s abouthedata.com fared slightly better. It correctly identified me as male and provided some correct generalities such as the killer insight that I’ve purchased apparel and food. Does that really feel like it is worth the $2.3B IPG just shelled out for it?

So how did we come to accept the validity of flawed data? First, venture capitalists poured money into third-party data startups, “proving” that the data was legit. Then large companies allocated budgets to ad tech, “proving” it was effective. Finally, big marketing clouds like Oracle and Salesforce went shopping, “proving” that the obscure periphery of our internet history data is worth hundreds of millions of dollars. But what if large sectors of the data industry have grown without their theses being fundamentally validated?

Of course, there is one company that understands the superfluousness of all this data. And surprise, it’s Amazon. To Amazon, you are what you buy. Nothing more, nothing less. The kind of data collected by Cambridge Analytica is fundamentally meaningless to Amazon because it is less powerful than their own data they can provide to advertisers on shoppers. As Amazon prepares to eclipse $10B in ad revenue this year, the direct correlation of Amazon’s data to what we buy is why Sorell lost the most sleep over their market entry.

In a macro sense, the totality of this data mining fundamentally can’t match the simple power of contextual targeting that powers Google and Amazon. And Amazon’s targeting is probably more powerful in the long-term. At the end of the day, you telling an advertiser what you are looking for is still a hell of a lot more accurate than the best artificially intelligent guess.


The cruel irony here is that the current ineptitude of data providers actually protects the privacy of many internet users. Digitally savvy consumers overwhelm ad tech algorithms with so many data points that they become impossible for data brokers to distill in any meaningful way. This “unknown” audience becomes less valuable to advertisers and enjoys a superior internet experience with their privacy intact.

This is the hallmark of a broken market. Advertising is effectively supposed to be a tax that we all pay to enjoy free services that have no business being free. But currently, the externality of digital advertising is being disproportionately picked up by the subset of individuals that firms like Oracle and Acxiom can caricature.

This alone should inspire deeper soul-searching in the industry- consumers deserve a more evenly distributed digital future. Alongside a genuine respect for privacy, we need an unwavering commitment to accuracy from our data wardens.