Trust, Not Data, as the New Oil: Designing for Data Trusts.

Andrew Hoppin
6 min readFeb 4, 2020

--

“Data is the new oil” became a common adage in 2019, and indeed 2.5 quintillion bytes of data are recorded, stored, processed and analyzed, every day, and worldwide, stored data will grow 61% to 175 zettabytes by 2025. This growth seems inexorable, for organizations that make more data accessible to artificial intelligence will have the best machine learning models, and will ultimately make the best decisions. Thus in a paradigm of competition, the companies — or countries — with access to the most data will win, whether at the games of streaming video, targeted advertising, or even military intelligence. It’s no surprise, therefore, that we’re all designing to optimize for hoovering up as much data as possible, thanks to the voracious appetite of ever more powerful machine learning models.

Over the past ten years I’ve been part of a global open government data movement that helped to lay much of the policy, software, and standards groundwork for the collection and sharing of massive public data sets that today drive machine learning models improving everything from city planning to disaster response to healthcare delivery, unlocking billions of dollars of public sector savings and new economic development, and in some cases saving lives.

But when it comes to personal data, this data revolution is often happening at our expense. Today any of you could buy 400+ data points from any of dozens of data brokers about me (and 250+ million other Americans), but I have negligible true agency over what data can be collected about me, and how it can be used. Who does have agency over my data? Usually the answer is a corporation, whose directors have fiduciary responsibility to maximize their profits, not to ensure my privacy, let alone to maximize the public good that my data could contribute to. The pernicious results are myriad, from discriminatory financial lending based on biased models, to targeted advertising memes that destroy democracies, to deep fakes that could imperil judicial systems themselves. We have an existential trust problem when it comes to the collection and use of our personal data. As with petrochemical oil, data as the new oil creates power, but data without trust is dirty coal.

We need to re-design the data economy to optimize for trust, so that we can benefit from the benevolent power of big data unleashed by machine learning, while protecting ourselves from its ills.

GDPR, and as of January, the CCPA in California are a good start in that they mandate individual consent for data collection and processing. But as long as we rely on only a handful of platforms for our digital life, if our only remedy is to opt-out or delete our account, it presents a false choice; as my friend Anouk Ruhaak at the Mozilla Foundation says, “the importance of our ability to freely choose how and when we share our data breaks down when the ‘choice’ is between surrendering data about ourselves and social exclusion, or even unemployment.” Data protection laws are foundational, but insufficient.

Some have proposed to break up big network effect tech businesses, or even to nationalize them — but doing so could undermine the value of these services in the first place. I want to be where my friends are — I want the effect of the network, *because* it has more data, thus better models, and thus a better user experience for me. And I certainly don’t want my government to *be* the network, let alone someone else’s government.

More promising is the advent of the personal data store — protocols such as Solid and Digi.me, and to a large degree Apple’s iOS itself, enable us to keep the cryptographic key to our own data under our own direct control, but still accessible to the networked services we wish to use, for as long as we wish. Thus control of your data is separated from the companies that wish to make us of it. This is a helpful step, but still insufficient, because it leaves a heavy burden of deciding when and how to share what data with whom for how long for what purpose and in exchange for what — on us! Even if I were sophisticated enough to make rational choices about contextual consent all the time, it’s not what I wake up in the morning wanting to deal with, and I’m still ultimately on my own to choose whether to participate in the network that gives me value, or whether to opt-out and hinder my life.

To build a new data economy that we can trust, I believe we need to invest in two key innovations:

First, for data producers (you and I), we need to build and support data trusts. A data trust (or ‘proxy’) builds on your new legal right to give consent for the usage of your data, and also on the emergent technological ability for you to wield the cryptographic key to your own data. It affords individuals (and communities) the pragmatic ability to delegate responsibility for managing these complicated choices, and the corollary enabling technologies, to a third party that is legally charged watching out for you (and millions of people like you), just as a doctor’s responsibility is to take care of you, rather than to maximize profit. Network effect businesses will, newly, need to negotiate with your data trust in order to maintain their data access. But they will be able to negotiate with a limited set of sophisticated actors that are empowered to make rational choices for large groups of people that best balance the value you get from the platforms you patronize with the privacy protections you deserve. And the threat of millions of accounts leaving a platform en masse will carry a great deal more negotiating weight than you or I clicking “no” on our GDPR-inspired pop-up window.

Second, for application designers, we need to develop a complete “Trust Stack.” Like a software stack, this is a collection of tools that will work together to help software applications respect the data governance of our Privacy Rights, our Personal Storage, and our Proxy Power for our data. This stack would consist at least of framework support for:

  • RegTech tools that facilitate compliance with, and even harmonization of GDPR, CCPA and a raft of other upcoming data protection laws. SaaS services like Aptible and Tugboat Logic in the US are well along in automating this onerous process.
  • Decentralized Storage protocols that give literal keys to personal data to the people that the data is about. Tim Berners-Lee’s Solid, Digi.me, and a number of other would-be personal data wallet innovators are close to making such tools mass-market ready, but in meaningful ways, Apple already has.
  • Contextual Consent protocols that support nuanced consent choices such as what subsets of data may be used, by whom, for what purpose, for how long, and in exchange for what. Spain’s SalusCoop, America’s Sage Bionetworks, and my own startup CoverUS are working actively in this arena.
  • Data Trusts that allow individuals and communities to delegate responsibility for managing these choices and the corollary enabling technologies. The new Global Center for the Digital Commons is playing a leading role in getting pilots off the ground.

Returning power over your data to you isn’t just the right thing to do — it could also be the key to democratic societies being able to compete with more dictatorial surveillance regimes in the era of artificial intelligence. If you can trust what is being done with your data, and perhaps even be compensated for it, you are more likely to be willing to collect and share more data in the first place. We have the potential to couple such enriched — and contextually consented — personal data with the existing extensive public open data infrastructure that numerous democratic governments have built up over the past decade. Thus the richness of combined public and personal data available to train our models could even conceivably outstrip that available to centralized surveillance states.

How do we design for such a future? How can we as a community take action? My invitation is to:

  1. Design technology platforms and experiences to be trust-ready — presume that your software will ultimately need to be governed by a Trust Stack comprised of regtech and contextual consent tools, and fed by external decentralized data storage.
  2. Invest in Data Trusts — this nascent idea, as powerful as it is, is at a stage where applied examples are possible, and needed. You can fund standing up a Data Trust for your use case by getting in touch with the Global Center for the Digital Commons, and you can encourage your own vendors and customers to do so as well.

Many thanks to Anouk Ruhaak for the education, inspiration and guidance relating in particular to the data trusts content in this post.

See also the keynote talk I gave at Interaction Week on this topic.

--

--