A New Deal for Data

How to reinvent the internet’s broken data ecosystem and put users back in control

Published in

The Startup

12 min readNov 14, 2019

Written by Katarzyna Szymielewicz
Edited by Kevin Zawacki
Illustrations by Kamil Śliwowski

Something is fundamentally wrong with our data ecosystem. This is not something we feel — it is something we know for a fact. We know it from personal experience and scientific research. Over the past two decades online, we’ve experienced constant surveillance, targeting, and manipulation. We need no further evidence that the business model of leading internet companies — a business model built on commodifying our attention and exploiting our data — is broken and unsustainable.

It all started with a misleading transaction

Two decades ago, many of us accepted the “data for a service” model because we did not realise the steep costs. We could not imagine how much data would be taken from us and generated about us — and how much power that data would give to big internet companies and platforms. Back then, many believed we still remained in control of our data. And few people were raising alarms about hidden tracking mechanisms and the potential of big data analytics. Indeed, it took a lot of research, journalistic investigations, and whistleblowing to reveal the full scale of data exploitation that happens behind our screens every second of every day.

But now we know. Even if we cannot wrap our heads around the amount of data being collected and generated — it’s beyond comprehension — we experience first-hand how this information is weaponized. On our deeply commercialized internet, every personalised service is influencing us. In fact, quite often a service is just a smokescreen for influence.

This illustration comes from an earlier essay on the mechanics of profiling and targeting

The game of influence

The accepted wisdom is that “free” online services come at the price of our personal data. And, as a result, that targeted advertising is a necessary component of the equation. But this business model has serious deferred consequences: It’s created a web where engagement, and not quality, is of the utmost importance. Further, targeted advertising has given enormous power and influence to certain actors. It benefits the influencers — the tech platforms and their business clients — but not us.

Now, after two decades of putting this “wisdom” into practice, leading internet companies have developed predictive models that are stunningly effective and powerful. Meanwhile, there are no ethical limits: Micro-targeting can be based on sensitive characteristics like psychometric profiles and vulnerabilities. Advertisers can catch us completely unaware, exploiting weaknesses and biases we may not even know we have.

Targeted advertising alone is a compelling reason to feel uncomfortable online, but it’s hardly the only reason. Vast influence is also wielded by the curators of our personalised news feeds, which are frequently optimised for engagement. These services want to keep us engaged at any price, even if it means recommending misleading, outrageous, offensive, or disturbing content. And users can’t shape these algorithms, can’t tame them or tweak them. Instead, the algorithms shape us.

In short: It’s clear that the current business model is a boon for platforms. It optimizes profit above everything else. As long as this system remains unchecked, companies simply will not grant users control or a say in the equation. Why not? Because we wouldn’t allow this to continue.

We would not accept algorithmic discrimination, which places higher value on men then women when AI searches for job candidates. We would not choose to be over-stimulated and addicted to platforms that feed off our fears and insecurities. We would not be okay with manipulative political advertising, or with sharing our intimate behavioural profiles with the highest bidder. We never gave permission for this in the first place; these abuses were an insidious consequence of a business model we “accepted” after being promised free, personalised services. There wasn’t even fine print.

It’s not just about you. It’s about us.Business model developed by online platforms not only causes harm to individuals. It also has serious, negative effects on a societal level:* Platforms control access to markets, determining winners and losers, and challenging existing economic structures.Gig economy employment models undermine traditional worker protections, thus increasing economic inequality.* Platforms prioritize clickbait and siphon advertising revenue that might otherwise have supported traditional news media, thus undercutting quality journalism.* Their interfaces are designed to maximize the amount of time spent by users, in order to capture and monetize data from user interactions. This model of media consumption has long-term impact on public health and well-being, especially among children and teens.* Targeted advertising has public health implications for vulnerable communities — for example, populations that are bombarded with advertisements for unhealthy food products.* Social media platforms have become the new public square for information and discourse. As arbiters of content, platforms encounter challenges on both ends of the free speech spectrum: both amplifying dangerous and discriminatory speech, and silencing legitimate democratic voices.These claims are based on the report Modern Platform Companies and the Public Interest: A Landscape of Harms and Accountability Efforts (August 2018), prepared by Freedman Consulting, LLC.

This data ecosystem cannot be fixed. It must be reinvented

This ecosystem is a threat to both individuals and to society, and a system of that nature is beyond repair. Half measures — a couple of new obligations or sanctions for the worst offenders — won’t suffice. Nor will advocating for self-regulation or nebulous concepts like “trustworthy AI.” In Europe, we’ve already tried all this: regulatory interventions like GDPR, antitrust financial penalties, codes of practice for disinformation, and ethical councils like the High Level Working Group on Artificial Intelligence. Some money has changed pockets along the way, but the broken data ecosystem remains very much intact.

In 2019, we must acknowledge that we cannot fix this ecosystem. Instead, we need to upend the power balance. A more sustainable, human-centric ecosystem begins only when we reclaim control over our data. We have heard or made this call so many times. But what exactly does it mean to reclaim control over data in an ecosystem where data is produced out of our sight and beyond our awareness, and often based on information provided by other people?

Most powerful online companies control the storage of our data: all the content that we upload, our profile information, and the metadata that we generated by using their services. But this is not everything. These companies also control the story of our data: algorithms and statistical models which transform our digital traces into meaningful marketing profiles. And that story is highly, highly valuable.

“Storage” and “story” metaphor was coined by Amir Baradaran, an artist and storyteller

Controlling raw data is not enough

In order to reinvent the data ecosystem and make it work for us, rather than against us, we must consider the story and not just the storage. We must move beyond thinking about raw data stored in commercial servers as the main privacy problem. It is far more important to control the inferred data and predictions made by algorithms — because this is the knowledge used to influence us. Even if we somehow managed to delete all the raw data that was collected about us, the powerful statistical models that have already been developed won’t disappear. Unless we question these models and their use against us, the game of influence will go on.

It’s no surprise that Google continues to track the location of its users, even when users explicitly say “don’t” (by choosing relevant settings in Android). It’s no surprise that Facebook only shows users the innocent, non-controversial set of interred interests (you probably won’t see more than 50 categories that were attributed to you), while offering a far more sophisticated and far more intrusive profile via its advertising interface (paying clients can play with tens of thousands of categories that describe you and other users).

It’s no surprise that in the existing, profit-driven online ecosystem, full access to our data is available to those who can pay, and not to those who have rights. While advertisers can exploit our vulnerabilities and target us with clickbait, users cannot even get access to own full marketing profiles.

The same commercial logic creates a barrier for nonprofits and public service providers: there is no interface that would cater for their legitimate needs, at the same time protecting users’ privacy. Healthcare providers cannot use aggregate data generated by millions of Fitbits or search queries, because they do not pay for targeting and they do not develop profitable apps. It has become second-nature to use data for profit, but almost inconceivable to demand access to data for the common good. And this is wrong.

We have to reclaim the power of our story

Not all the data that is collected today should be collected in the future. There is a real danger in making unfair predictions about us and detecting our hidden, sensitive characteristics. But there is also real value in understanding patterns of human behaviour, so nonprofits and trusted public entities can use them to help solve societal problems. We don’t have to destroy all the data created by commercial companies in order to protect ourselves from ill-intentioned influence.

The real question is: Who should have power over our data and its story? If it is users and our trusted institutions, data won’t be weaponized. And, its collection will be justifiable. Our data does not have to tell the story that advertisers pay to hear. It can also be used by us to solve our own problems; to tell others who we are and what we need. It can be used to deliver social value and ethical services, free from commercial ends and hidden influence. These options cannot emerge in a privatised, profit-driven ecosystem, but they could emerge in a new one, reinvented to serve societal purposes.

Let’s protect people, not the market

We must demand that policymakers regulate online platforms in exactly the same way they regulate pharmaceuticals, cars, and telecom providers. We need regulation that will address not only individual harms, but also societal harms caused by business models that exploit data and aggressively target us. The gravity and scale of these harms is documented well enough for us to stop debating them and finally move towards policy solutions.

When seeking solutions for digital markets, we can draw inspiration from health and environmental regulations. Imagine labels for algorithms akin to labels for potentially-dangerous, but also useful, medical drugs. Think of privacy by design executed in the same way we managed to introduce seatbelts in cars. Think about enabling interconnection between online platforms that would protect consumer interest the same way it was done in telecommunications or energy markets.

As long as so few companies have so much power over our data, and as long as the same companies control our communication networks, then ethical alternatives have no space to grow. Positioned as “users,” we can only rely on the benevolence of these digital emperors when asking them to respect our rights. We don’t have tools that empower us to do otherwise.

We need solutions that will transition power over data from centralised platforms back to their users. We need to move beyond the language of the GDPR and start talking about breaking the power of online platforms — breaking the monopoly that controls billions of people’s storage and stories.

New roles and uses of data will become possible if we manage to break online platforms’ power to control our data and its interpretation (done by algorithms):

users could order data management services that help them control their marketing profile and execute their rights under the GDPR;
users could choose non-curated newsfeeds, if they don’t want the platform to decide for them what is important;
competing companies could offer more transparent and ethical targeting interfaces for advertisers;
specialised non-profits or news agencies could offer users different tools to curate news, based on transparent algorithms;
public service providers could use aggregated data from maps or search queries to solve health, environmental, or transportation problems.

The first step? A standardised API

Rethinking storage and story — that is, transitioning from “gated” online platforms to interconnected, open infrastructure — won’t happen in one step. It’s one thing to say we need to end walled gardens; it’s another thing to imagine a replacement. It’s a tough road: We will face security challenges. We will find problems that don’t have immediate solutions, such as communication between members of closed groups located in different networks. And we will need real political power to break resistance of the dominant players. It will be a long march, but we have to start somewhere.

A good place to start is to demand that closed, commercial platforms — or, at least those that have gained dominant positions — develop a standardised API (application program interface), so that users can easily see and verify their data; can move it to other places (for example, a competing social network); or can leave it on the platform but control the story that is told about them (i.e. modify their marketing profile). The structure of such an API would need to be defined and controlled by an independent regulator, like a consumer protection authority.

How would this work in practice? Imagine you are a user of a global social network. You have several grievances: One, you are increasingly irritated by targeted advertising that exploits your vulnerabilities. It’s so accurate, it seems like they must be listening to your most intimate conversations. Two, you don’t like the way your newsfeed is curated: It’s filled with clickbait. Three, some of your friends publish updates on different networks, but you don’t want another app on your phone and another source of distraction.

There is not much you can do about these things today. But with a standardised API in place — defined and controlled by a regulator — the following options would emerge:

You could verify and limit the marketing profile about you that is stored by the social network, without asking their permission and without relying on the limited settings they provide in their own user interface. If you don’t like a category in your profile, you could delete it with one click. As a result, advertisers and commercial developers would only have access to inferred data and predictions about you that you authorised for such use.
And if you prefer not to deal with this on your own (yes, managing data can be time consuming), you could join a data management service or a data trust of your choice and ask them to do this on your behalf.
If you don’t trust the social network to curate your newsfeed for you, you could either request a non-curated (chronological) stream of updates, or delegate its curation to another entity, like a news agency or a non-profit of your choice. Via the standardised API, such an entity would be able to connect to the social network, gain access to your data, and offer an alternative version of your newsfeed.
If you are curious about updates from your friends who use other social networks, you could connect with your API to their API to exchange posts, reactions, and comments across platforms.
Finally, the standardised API could use a simple RSS function to enable users from outside the social network to follow all public updates (in read-only only format).

This API would be the first step toward an interconnected infrastructure that is controlled by citizens-users, not by advertisers.

If we can imagine it, it’s possible

I first began exploring this idea — this “New Deal on Data” — on stage at MozFest in October 2019, and will continue to think about possible solutions over the next 12 months. By the end of 2020, we may have answers to many of the problems that seem too difficult today.

More than anything else, this is an exercise in political imagination. I hope we can do this exercise together. If you are working on a related project, if you want to share your views on how to build interconnected and open infrastructure for digital society, if you know prototypes that fit in this model — get in touch!