Today’s Rembrandts in the Attic: Unlocking the Hidden Value of Data
A shorter version of the below appeared in The Harvard Business Review on May 15, 2020
Twenty years ago, Kevin Rivette and David Kline wrote a book about the hidden value contained within companies’ underutilized patents. These patents, Rivette and Kline argued, represented “Rembrandts in the Attic” (the title of their book*). Patents, the authors suggested, shouldn’t be seen merely as defensive tools but also as monetizable assets that could be deployed in the quest for profits and competitive dominance. In an interview given by the authors, they referred to patents as “the new currency of the knowledge economy.”
We are still living in the knowledge economy, and organizations are still trying to figure out how to unlock under-utilized assets. But today the currency has shifted: Today’s Rembrandts in the Attic are data. At the same time, the currency and the means of unlocking the value of data are quite different than with patentable innovations. Unlike patents, the key to harnessing data’s value is unlikely to be found in restrictive licensing approaches. Unlocking the value of data requires a new approach, one that recognizes that the value of data ultimately lies in access, collaboration, and application of data to solving a wide range of problems using tools like machine learning.
The vast amounts of data being generated today represent a vast repository of potential value (and danger), and not just monetary: there is tremendous social good to be unlocked as well. But do organizations — and more importantly, do we as a society — know how to unlock this value? Do we know how to find the insights hidden in our digital attics and use them to improve people’s lives? And how do we unlock these treasures in a manner that is societal beneficial?
At The GovLab, an action-oriented think tank located within NYU, we are dedicated to unleashing the societal value of data to improve decision making. Our work with corporations, governments, and non-profit organizations has convinced us of the tremendous potential (and risks) of this data, and also that the potential remains largely unfulfilled.
In what follows, we outline four steps that we think could help data stewards (we resist using the term “data owners”) to maximize the potential.
If there is an overarching theme that emerges, it is about the value of re-using. In recent years, several countries have witnessed the rise of an open data movement, and a growing number of organizations have taken steps to release or made accessible previously siloed data sets. Despite occasional trepidation on the part of data holders, our research has repeatedly shown that such efforts can be value-enhancing — both for data holders and for society at large. Better and more transparent re-use of data is arguably the single most important measure we can take to unleash the full possibilities of data.
1. Develop new methodologies to identify and measure the value of data
The first step required to fulfill this potential is for all stakeholders to arrive at a better understanding of just what we mean by value. Today there exists widespread consensus that data is valuable. Despite such agreement, however, there exists no equally accepted method for calculating or estimating the value of data. Such a consensus must be arrived at through a broad process of consultation that involves data holders and users from all sectors, as well as policymakers, researchers and academics, and civil society or other groups representing the public interest.
One important consideration is what variables or indices to use. While data may have monetary worth, it can also have what a recent report by the Bennett Institute for Public policy refers to as “social welfare” value, which means that unlocking or sharing data could contribute to “the wellbeing of all society.” These two forms of value — societal and monetary — may not always coincide. For example, in re-using data, an organization may surrender a certain amount of financial advantage (or incur an opportunity cost) even while contributing to the broader social welfare. To guide such difficult decisions, policymakers and society at large must determine broader metrics of valuation and consider how various metrics interact and sometimes clash. In addition to social and monetary value, other metrics to consider include potential harms that may result from releasing or sharing data; and the opportunity costs of not re-using data. All these metrics co-exist in a delicate balance. Considered together, they can help organizations determine the true value of data.
2. Develop enabling ecosystems and collaborative frameworks to move from extraction to co-creation of value
Unlike physical assets, data goods are non-rivalrous and intangible, which means that they can be shared without depriving their original holders of benefit. The process of maximizing under-utilized data assets will therefore often involve arriving at new institutions and frameworks to enable data collaboration and what we call “co-creation of value.” This concept of co-creation is not new and various experts have called for the creation of new institutions to facilitate it in different sectors. In her book, The Entrepreneurial State, University College London Professor Mariana Mazzucato argues that such a framework is necessary to bring the public and private sectors together to spur innovation. She writes:
“Creating a symbiotic (more mutualistic) public–private innovation ecosystem thus requires new methods, metrics and indicators to evaluate public investments and their results. Without the right tools for evaluating investments, governments have a hard time knowing when they are merely operating in existing spaces and when they are making things happen that would not have happened otherwise. The result: investments that are too narrow, constrained by the prevailing path-dependent, techno-economic paradigm.”
Data use can operate the same way, bringing together different institutions from the public and private sector to find new, innovative approaches through what we call “data collaboratives.” We outline some specifics of these institutions and frameworks below. The broader goal is to create new ecosystems of sharing that move beyond legacy models of value extraction and asset hoarding.
Drawing on the analogy with patents (those earlier “Rembrandts in the attic”), it is worth pointing out in this context the dangers and risks of not sharing. While patents can be competitive assets for companies, they also often block innovation and prevent true competition from emerging. In much the same way, data hoarding can result in broader societal and monetary losses. These losses may ultimately rebound on the data holders themselves, who fail to benefit from missed-out innovations or breakthroughs.
3. Innovate with new data collaborations and re-use conditions
In order to enable sharing, we need new structures that foster partnerships and more collaborative approaches. The old model of single-ownership is outdated and no longer conducive to maximizing the value of data assets. Data holders are also unlikely to foster public value creation by simply auctioning off their “Rembrandts” to the highest, most well-resourced bidder. Several structures have been proposed, including data co-ops, data commons and (our preferred term at The GovLab) data collaboratives.
Data collaboration can take many forms. In our typology, we generally focus on two defining variables: engagement and accessibility. The first variable, engagement, refers to the degree to which the data supply and demand actors co-design the use of corporate data assets. We find that collaboration is often independent, in that the private-sector holder has little to no involvement in data re-use, cooperative, in that data suppliers and data users work together, and directed, in that the data holder seeks a specific product. The second variable, accessibility, the extent to which external parties can access private data. Within it, we find that data is either open access, in that there are few restrictions on who can see it, or restricted, in that only pre-selected partners received unfettered access.
From these variables, a variety of models emerge. For data collaboratives where there is both independent data usage and open access, we see public interfaces such as APIs, which the startup Numina uses in its Street Intelligence initiative, and data platforms, such as Uber Movement. Data collaboratives with open access and directed use often are prizes and challenges in which companies make data available to individuals who compete to develop apps. These competitions can be open challenges that allow participation from the public, such as the LinkedIn Economic Graph Challenge, or more selective challenges directed at a few trusted partners, such as the Orange Telecom Data for Development Challenge. Other models include data pools, trusted intermediaries, research and analysis partnerships, and intelligence generation ventures.
While each of these frameworks has its distinguishing characteristics, they all share a commitment to developing fresh forms of data management and re-use conditions, including new sharing agreements and licensing provisions. They begin from a recognition that data is in many respects unlike conventional assets, and that more sophisticated forms of governance and value-maximization are required in order to work through some of the tradeoffs and competing valuation metrics. In particular, these new partnerships and holding entities can help balance the private (often monetary) value of data holders and the wider societal benefits of sharing.
A variety of barriers stands in the way of streamlining open and re-using data, including the absence of an enabling and scalable policy and legislative agenda, the lack of internal capacity and limited access to external expertise and resources. And at a time of increased data asymmetries, and the growing need for data to develop artificial intelligence we need to develop new intermediaries that can help to lower these barriers and build a center of expertise that is available to all; and democratizes data availability.
Those focused on enabling data collaboration should also take a page from the open source software movement. Open source has become the default business model thanks to the existence of flexible yet trusted policy and legal instrument, concepts of good stewardship, and an ethos of collaboration and peer-learning.
4. Identify and nurture data stewards
As data collaboratives and other similar structures gain increasing validity, it is becoming clear that new human and institutional roles will be required to foster them (and more generally to encourage a culture of sharing). In our work at the GovLab, we have identified a key role within data holding organizations for what we call data stewards. As the European Commission’s High-Level Expert Group on Business-to-Government Data Sharing recognizes, these individuals or teams empowered to proactively initiate, facilitate, and coordinate data sharing are essential to using cross-organizational and cross-sector data toward the public interest.
Data stewards can be seen as the curators of the “Rembrandts” held by a business or institution. They are individuals or groups who manage data within their organizations, and whose specific remit is to maximize both the societal and monetary value of data assets by fostering collaboration and sharing, with an eye to maximizing both societal and monetary value. Among other responsibilities and roles, data stewards can identify under-utilized data that may have potential value; locate and foster partnerships to help unlock that value; and ensure a responsible framework that balances potential benefits of sharing against possible risks such as harms to privacy or security.
*Venkatesh Hariharan reminded me of the book during a fascinating discussion at The GovLab on governance and data collaboratives, which was in part the inspiration for this blog.
** Thanks also to Andrew Young, Dave Green, Akash Kapur, Michelle Winowatan and Andrew Zahuranec for their input.