Turning to Purpose-Driven, Collaborative Open Data Sharing and Standards in the Public and Private Sectors: Takeaways from the State of Open Data Policy Summit

By Uma Kalkar and Marine Ragnet

From improving disease response to enabling more equitable public transportation to promoting child well-being, responsible access to data has become essential. But what policies are needed to facilitate and incentivize the opening of data for public interest purposes? What recent legal and policy developments are starting to shape the space of data collaboration? These and other questions focused on the state of open data policy were addressed on Tuesday, May 17, 2022, at the first “State of Open Data Policy Summit” hosted by the Open Data Policy Lab — a collaboration between The GovLab and Microsoft.

Attendees at the State of Open Data Policy Summit had the opportunity to listen to and engage with senior leaders in government, civil society, and business about the policies that can support impactful (re)use of data for the public good. The Summit featured both a public and private sector-oriented panel, as well as a keynote address from Juan Ferres, Vice President and Chief Data Scientist of Microsoft.

Watch the full Summit on YouTube.

The GovLab’s Co-Founder and Chief Research and Development Officer Stefaan Verhulst opened the Summit with a discussion on the Third Wave of Open Data. First explored by the Open Data Policy Lab during the 2020 Summer of Open Data web series, the Third Wave of Open Data looked to unlock data in a purpose-driven manner across stakeholders and through social, economic, and political lenses.

The four pillars of the Third Wave of Open Data.

However, understanding the attributes of the Third Wave of Open Data is only part of the puzzle — unlocking public-private partnerships to share data meaningfully, responsibly, and widely is the next step. Stefaan emphasized the need to examine and update the policies around open data to promote third-wave principles of intersectoral collaboration, incentivization, and demand-driven data use. He spoke briefly on the Open Data Policy Lab’s own work to understand policy developments surrounding open data, data reuse, and data collaboration around the world — State of Open Data Repository of Recent Developments, a living resource of over 50 legislative acts, proposal directives, and other actions from around the world to open datasets.

Panel #1: Strengthening Open Data Demand: Policies to Foster Open Data at the Global, National, Regional, and Local Level

The Summit’s first panel featured Catherine Stihler, CEO of the Creative Commons, Charlie Martial Ngounou, Executive President of AfroLeadership, and Jiri Pilar, Legal and Policy Officer at the European Commission. Stefaan served as the panel’s moderator.

Panel 1 participants: Stefaan, Charlie, Catherine, and Jiri.

Catherine reflected on how “open data plays such a central part about how we hold [others] to account but also how we use that information,” and recognized the frustrations felt by policymakers and advocates around creating open data regulations that are informed both by hard evidence as well as the stories of those collecting and using the data. She made the case for increased open data policies by governments that are influenced by non-profit, private sector, and public sector users of data in order to leverage information for good while navigating the intricacies of subnational and national politics. She underscored the need to justify open data practices to build trust and legitimacy of open projects in areas such as misinformation.

Jiri then detailed some of the European Commission’s work, explaining some of the main trends and shifts within the institution. While his focus has been on creating a singular market for data in Europe, he acknowledged that this initiative cannot exist without specific attention to protecting democracy, promoting human rights and law, and providing institutional transparency. He noted that in the past decade, data opening efforts have been concentrated on the public sector; only now attention has turned to the private sector. To this end, the European Commission has introduced the Data Governance Act, which makes it easier for researchers and companies to access data to generate insights and products, and the Data Act, which seeks to create a framework for data sharing across the European Union to improve B2G and B2B dealings.

“We are looking to create a place,” Jiri said, “where data can flow between countries and data holders.” In this “common European data space,” businesses, governments, and individuals will have access to lots of high-quality data with clear usage allowances and restrictions.

Catherine jumped in and added that a Europe-wide space can only go so far to address global problems, such as climate change. Currently, the Creative Commons is investigating how to open data to address the extreme climate crisis. Much of the detailed data remains hidden behind paywalls, rendering them inaccessible to deal with hot-button issues. Along with data legislation advanced by the European Union, Catherine emphasized the need to create global data sharing standards to raise an efficient response to global problems.

Looking to the African open data landscape, Charlie lamented that the continent does not have a comprehensive, supranational data strategy because of heterogeneity in national open data standards and practices, which have not been addressed due to capacity constraints, a lack of inter-state communication, and issues of putting open data on the policy-making agenda.

“I think that policymaking is quite a daily exercise,” Charlie said. However, for African policymakers, digital and data-related issues are not permanent, consistent agenda items. As well, the rapid evolution of technology makes it difficult for policymakers to stay ahead of the curve and use data in an effective, purpose-driven manner. Some areas — notably, South Africa, followed closely by Kenya, Burkina Faso, and Côte d’Ivoire, are leading data strategy charges, but these countries are the exception, not the rule. Recognizing open data as a tool for problem-solving requires data-driven policies that introduce data value in the public sector.

Next, panelists and audience members took part in a Q&A session that explored the significance of the language used around open data to explain its public importance, as well as the perennial question of opening data without undermining protection of data subjects. The panelists reinforced that how we speak about open data aims and cultural norms are vital considerations for effective data sharing and collaboration.

Stefaan closed the panel by asking each participant which one topic or area we could make progress in that would make a big difference regarding the open data conversation.

  • Catherine pointed to opening data to address climate change as an important area of focus and bring new people into the conversation.
  • Charlie emphasized the need to create global data policies that account for risk at a global level so as to protect each region from the common risks of the data-driven economy.
  • Jiri chose humans at the center of open data policy to live up to demand-driven data values.

Keynote Address — Microsoft Vice President and Chief Data Scientist Juan Ferres

Following the public sector panel, Microsoft Vice President and Chief Data Scientist Juan Ferres delivered a keynote speech on the need for open data for problem-solving.

Juan first touched on the hype and jargon around data-driven initiatives, touching on the human need for creating complicated- and sophisticated-looking solutions. While complex solutions appear impressive, simple solutions are impactful — however, the latter is much harder for policymakers and laypeople to believe and buy into.

Data-driven problem solving is not new — Juan brought up the famous example of John Snow, the father of modern epidemiology, and his data collection methodology to trace cholera outbreaks to the Broad Street water pump in London’s West End. However, the problem modern John Snows face is which problems to concentrate on and how to direct useful data towards addressing those issues. Today, the speed with which data are generated, along with significantly shrinking storage and processing costs, opens the door to numerous opportunities to work with data.

The use of artificial intelligence and machine learning allows us to train models to harness these data to solve problems. “This is the power,” Juan emphasized, “because, you know, to solve a problem all we need [are] data and then success criteria.” He proclaimed that data is not the new oil, but rather, the new code. For algorithms, data serves as a backbone to program and run efficient and effective tech solutions.

Yet although data collection and preparation are crucial prerequisites for useful datasets, these endeavors are not well recognized. For instance, the ImageNet project run by Fei-Fei Li is not well known but has built a database of millions of pictures of objects that has served as the basis for numerous algorithms training sets. Juan stressed that the value of open data and its assemblers need to be recognized to incentivize continued dataset creation.

He then discussed use cases where open data is the only option to address issues, such as in predicting rare diseases like diabetic retinopathy. Juan also presented Microsoft’s Differential Privacy Platform which allows researchers to use data in a privacy compliant manner to allay access and anonymization concerns. He detailed some open data projects Microsoft is working on, from using open CDC data on SIDS to understand the impact of smoking on infant deaths to harnessing Sentinel 2’s low-resolution satellite data to track solar farms in India and help conservation efforts in Colombia, to leveraging the Landsat remote sensing data to track glacial melting in the Arctic.

Juan ended his presentation by reiterating that, from a data science perspective, ‘for good’ ventures and revenue-seeking practices, such as improving digital ad clicks, use different data in the same manner to solve problems. The impact of the two, however, differ greatly — and he reinforced the need for better recognition of open data creators and usage to encourage researchers to focus on social impact.

Panel #2: Institutionalizing Open Data in the Private Sector: Policy Options and Innovative Practices

This second panel included Ioana Stoenescu, Government Affairs Manager at Roche, Fiona Greig, Managing Director and Co-President at JP Morgan Chase Institute, Caroline Louveaux, Chief Privacy Officer at Mastercard, and Brennan Lake, Vice President of Social Impact at Spectus.ai. Andrew Young, The GovLab’s Knowledge Director, moderated the conversation.

Panel 2 participants: Andrew, Ioana, Caroline, Fiona, and Brennan.

Andrew first turned to Ioana to discuss her work in using data and collaborating with different stakeholders within the healthcare industry. Ioana noted that the use of open data is crucial to prioritize where to direct healthcare resources and provide secondary benefits, such as personalized medicine, reducing health care inequalities, and better allocating resources. She noted that “[h]ealth data reuse was estimated to be worth about 25 or 30 billion euros annually,” and added that new European data legislation specifically focused on opening data for health research.

Yet currently, the use of secondary data to optimize healthcare is limited. The new ecosystem of data requires “an integrated level of coordination” between the healthcare industry and technology companies, as well as encouragement from the public sector to drive data sharing and innovative use. Without the collective efforts in the public and private sectors, an open data revolution in the healthcare industry will not be effective.

Ioana also outlined three avenues to incentivize this data transformation. First, she reiterated the need for policies to anticipate needs and reduce fragmentation by using brain-trust from across sectors; second, that these policies must support those involved in data exchange and use; and third, we need measures and collect outcomes to establish what helps (and hinders) open data use in healthcare. She stressed that these practices need to account for responsible data use, improving layperson data literacy, and trust-building around the notion of using data as a tool.

Fiona highlighted that the pandemic brought to the fore many private sector data sources, and provided valuable insights for decision-makers. Noting the history of the Institute, which was created to use JP Morgan Chase’s administrative data to do economic research for the public good for policymakers and business leaders, she added that the business case is to not just be a data provider, but an insight provider.

Fiona noted that insight creation does not necessarily mean providing wholly open data on financial information, but rather working as “a part of the insight creation process” with academic and research collaborations.

Caroline jumped in, emphasizing that data has demonstrably shown its power in solving pressing problems. She talked about MasterCard’s long-standing history of using data for social impact initiatives. For example, the company works with cities to improve h urban planning and prepare responses to natural disasters. To overcome privacy concerns when using open data, Caroline mentioned how MasterCard has established plain-language data responsibility principles to instill a privacy-focused culture in the organization, along with investing in secure technology and data encryption methods. Moreover, she emphasized how MasterCard is helping set industry-wide data security standards. Yet in addition to this company-oriented progress, she pointed out the existing patchwork of laws and regulations that are overly complex and disincentivize data sharing around the world. For sustainable and useful data openness, Caroline looked toward developing “common standards on privacy, data, and technology” that foster trust and nurture data collaboration within the private sector.

Looking at ways to guide data sharing standards that inspire other companies to take on ‘for good’ causes, Brennan discussed data governance strategies used by Spectus.ai to drive their social impact programs. First, data is collected with upfront, informed consent, creating an opt-in environment for data use. Second, project parameters are strictly limited to adhere to data subjects’ agreed-upon data use scopes. Additionally, Spectus.ai’s clearly outlines what cannot be done with their data, such as use by law enforcement to look at vulnerable communities or protests. Third, data access is only available for a set period of time. Fourth, they use differential privacy — a concept picked up from Microsoft’s Smart Noise Program — and employ a “data clean room” that gives aggregated data to researchers.

Brennan also made the case that data created and shared by a company needs to be able to generate value in order to last. “There’s a really fantastic argument for this, because if a social impact program can generate value for the corporation that sponsoring it, it is likely to survive, it is likely to go through in the long run”, he noted. He detailed Spectus.ai’s ‘4M’ methodology to assess value creation of a data use project (and noted its similarity to The GovLab’s ‘9Rs’ framework). They look at the mission of the project to generate social impact, methods used by the project, monetization opportunities of the work to attract new commercial leads, and messaging around the initiative to produce good data journalism.

Further, drawing from a UN Global Pulse study, Brennan reiterated that there is a dichotomy between misuse of data and the missed use of data, and that the missed opportunities to share data have an opportunity cost. He advocated for using standardizable tools such as social licenses to make social impact data use more attractive for the private sector. As well, he closed by raising the idea of regulatory safe harbors that protect researchers who are using data for good.

Afterward, participants engaged in a Q&A about sensitive data sharing, such as providing mobility data around abortion clinics, which brought up the need for socially responsible data sharing by private companies. As well, questions on the challenges of auditing data sharing raised ideas about the feasibility and usefulness of conducting both internal and external audits in the fast-paced data regulatory environment.

Lastly, Andrew closed the panel by asking participants what they considered the most important policy intervention that could be implemented today if we had the right mobilization of proper decision-makers.

  • Ionna noted the need for further data governance and infrastructure to mitigate fragmentation, noting that the restrictions and the barriers in place block the cross-border sharing of data. “These are the challenging pieces that can be addressed in Europe now with the European data space,” she said.
  • Fiona talked about a possible give-get framework to incentivize the practice of data sharing in the private sector.
  • Caroline put forward the idea of regulatory sandboxes that could also build trust in the open data ecosystem. She noted these spaces could have multiple benefits, such as creating new opportunities for innovation or identifying new challenges.
  • Brennan seconded Caroline’s idea of regulatory sandboxes and highlighted its potential to “de-risk companies” and encourage more private data collaboration and use.

Overarching Takeaways

Across participants, six central themes emerged.

  1. Purpose-driven open data requires an understanding of data use objectives and framing. Showcasing open data as a public good can incentivize uptake and policy creation by governments and encourage social impact-oriented ventures to contribute to the global data commons.
  2. Collaboration at a global level can help address fragmentation in data use and policies. Open data requires public sector collaboration to harness international knowledge and capabilities to build cross-cutting policies to address intractable problems, such as climate change. Further, by recognizing the data efficiency and data resources of the private sector, multisectoral cooperation can help extract value from data to inform policy creation.
  3. Data is not the new oil, but rather, the new code. Data is the backbone of artificial intelligence and machine learning algorithms training. It is not just a resource, like oil, but provides the structure for more sophisticated data-driven models that, in turn, can solve problems when further equipped with success criteria.
  4. Incentivization can help build the availability and applicability of open data for pressing problems. Despite the primacy of publicly available data to build systems that can address wicked problems, those who create and contribute to open datasets are not given adequate recognition or incentivization to continue these efforts. Celebrating those who build open datasets is just as important as celebrating those who use these data to train models.
  5. Human-centered policy design is crucial for demand-driven open data initiatives. What results do people seek from open data initiatives? Both public and private open data projects can achieve better impact and outcomes if they understand the needs of their target demographics.
  6. Robust certification and data standards can help build trust in their data collection and use practices. Standards-setting around procurement, use, and (re)use of data helps build data subject trust around data use by both the public and private sectors. In addition, carving out ‘safe harbors’ for research and innovation amid regulations can incentivize companies to contribute to and open data projects for the public good.

***

Interested in learning more about the State of Open Data Policy Summit and the Open Data Policy Lab? Get in touch with the project team at opendatapolicylab@thegovlab.org.

About the Open Data Policy Lab. The Open Data Policy Lab is a resource hub supporting decision-makers at the local, state, and national levels as they work toward accelerating the responsible reuse and sharing of open data for the benefit of society and the equitable spread of economic opportunity. The Lab is an initiative of The GovLab generously supported by Microsoft. Learn more at https://opendatapolicylab.org.

About The Governance Lab at the NYU Tandon School of Engineering. The Governance Lab’s mission is to improve people’s lives by changing the way we govern. Our goal at The GovLab is to strengthen the ability of institutions — including but not limited to governments — and people to work more openly, collaboratively, effectively, and legitimately to make better decisions and solve public problems. We believe that increased availability and use of data, new ways to leverage the capacity, intelligence, and expertise of people in the problem-solving process, combined with new advances in technology and science, can transform governance. We approach each challenge and opportunity in an interdisciplinary, collaborative way, irrespective of the problem, sector, geography, and level of government. For more information, visit www.thegovlab.org.

About the New York University Tandon School of Engineering. The NYU Tandon School of Engineering dates to 1854, the founding date for both the New York University School of Civil Engineering and Architecture and the Brooklyn Collegiate and Polytechnic Institute (widely known as Brooklyn Poly). A January 2014 merger created a comprehensive school of education and research in engineering and applied sciences, rooted in a tradition of invention and entrepreneurship and dedicated to furthering technology in service to society. In addition to its main location in Brooklyn, NYU Tandon collaborates with other schools within NYU, one of the country’s foremost private research universities, and is closely connected to engineering programs at NYU Abu Dhabi and NYU Shanghai. It operates Future Labs focused on start-up businesses in downtown Manhattan and Brooklyn and an award-winning online graduate program. For more information, visit www.engineering.nyu.edu.

--

--

--

Responsible Data Leadership to Address the Challenges of the 21st Century

Recommended from Medium

Role of Time Series and it’s Components

Exploratory Data Analysis:English Premier League

Success With Analytics Starts with Data Literacy

How to improve the performance of a (Supervised) Machine Learning Algorithm

Data Science: Data Pre-Processing Using Scikit Library

Capstone Project for applied data science— The battle of Neighbourhoods

Scaling ML teams

Deploy a Production Ready On Premise Kubernetes Cluster

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Uma Kalkar

Uma Kalkar

Research @thegovlab

More from Medium

Selected Readings on Digital Self-Determination for Migrants

5 Reasons To Use An IoT Framework For Your Connected Devices

A Taste of Our Own Medicine

Adopting Edge Computing for Web Apps — 4 Key Enablers