By Stephen Burley Tubman, Eve Marenghi, and Andrew J. Zahuranec
The GovLab and Mastercard hosted a workshop facilitated by Stefaan Verhulst, co-founder and chief research and development officer at The GovLab, and JoAnn Stonier, Chief Data Officer at Mastercard, focused on establishing responsible data for social good practices on Sunday, September 15th, as part of the Bloomberg Data for Good Exchange [see JoAnn’s blog post about the workshop here]. The workshop was attended by over 100 data scientists, social engineers, lawyers, consultants and other activists interested in utilizing data to solve societal problems and advancing work to solve the United Nations Sustainable Development Goals (SDGs).
The workshop centered on data eco-systems, the needs of the various parties that contribute to the eco-system, related best practices, tools, and methodologies. Participants focused on identifying paths forward that would enable projects and initiatives to move work from individual, ad-hoc pilots to sustainable, responsible and systemic practices.
As The GovLab has established throughout its work, data collaboratives are integral to using data for good. In addition, Mastercard believes that data innovation must be balanced by responsible data practices.
But data collaboratives operate inside a larger ecosystem, with actors whose roles need to be better understood moving forward, among them:
- Data-Demand Actors, organizations who identify the problem to be solved and demand access to data. These actors include NGOs, charities, academic researchers and other beneficiaries;
- Data Suppliers, organizations that possess the data to help solve the problem — such as enterprises that collect data and can provide information to the ecosystem, data scientists, and data engineers; and
- Ecosystem Enablers, actors who can help scale the responsible and sustainable use of data for social good — such as philanthropy, policymakers and technologists and civil service organizations.
Asymmetries and challenges exist among and between different types of actors of the data ecosystem. For instance, actors who need data often lack capacity or a toolkit to articulate relevant data questions or use cases (academics are key in closing this gap). Supply actors, meanwhile, often lack the ability to handle regulatory, security and other requirements needed to address risks of providing data to solve social problems.
A key objective of the workshop was to identify ways to address these challenges. Toward that end, participants broke into groups and discussed the questions below. Each group contained data-demand actors, data suppliers, and ecosystem enablers. In what follows, we share key takeaways from the workshop as they relate to these three central personas.
Among the participants, there were several major takeaways as to the role of data-demand actors.
Need for Well-Defined Demand
First, most groups suggested that data demand actors — data users — clearly determine the scope of what’s going to happen with the data once they obtain it. Formulating good, targeted questions was integral to good data use.
“We need to be transparent about why we are requesting data,” said one participant. “We are often so sure why we are asking for data but sometimes we forget how other people see the situation or how others might use the data.”
A People-Centric Approach
This concern connected with another common thread: citizen engagement. Several groups spoke about identifying the individuals affected by data collection and soliciting their views. Others spoke about developing data literacy skills among the general public and encouraging participation in data-driven projects, with participatory budgeting highlighted as one such model to achieve this goal.
Many emphasized the need to be clear about intent and to use data as intended or find some way to ask new research questions retrospectively.
“We cannot always understand how we want to use the data at the outset,” said JoAnn. “But when considering when to use data, it can be helpful to ask if it is congruent. If the initial question was about public health, it can probably be used for another public health question.”
Participants also discussed how actors can improve data ecosystems. One such approach is making sure data-demand actors understand what the data represents and to acknowledge its limitations.
“We need to communicate the strengths and weaknesses of the data, to come up with guidelines to help us understand how good our data is,” said one participant. “We need a nutrition label for data.”
The participants also spoke at length about the data suppliers, offering a multitude of suggestions for how these actors, including data stewards within companies in particular, could guarantee rights for data subjects, promote responsible innovation, and scale up success. These comments focused on end-to-end approaches to data responsibility and building trust across the data ecosystem.
End-to-End Approach to Data Responsibility
One group spoke about the importance of classifying datasets by sensitivity, creating different levels to govern data use. While considering local circumstances, participants called for global standards on privacy and security. The discussion noted different kinds of privacy-preserving work, such as anonymization and pseudo-anonymization.
Other risk mitigation techniques also received attention. Groups spoke about data suppliers remaining cognizant of purpose limitation and data minimization to reduce the opportunities for malicious use.
Building Trust Between Actors
Still others encouraged suppliers to develop a “code of conduct” or defined methodology for data sharing, one that recognized the importance of clear standards, metrics, and contractsfor data use. Actors at all stages of data collection, preparation, and use needed to be involved to build trust and coordination. Communication with other actors needed to become a policy of first resort.
In reporting her thoughts, one participant said, “We need to come to an understanding of what can be shared and what cannot be shared. What kind of standards do we want to have on cybersecurity, legal obligations, or subject notification?”
Lastly, several comments emerged from the discussion on ecosystem enablers. These organizations, which can take the form of data clinics such as DataKind, play an important role in the data pipeline. However, enablers faced challenges regarding variable rules and access structures depending on industry, geography, or use case.
Bridging Gaps in Resources and Expertise
Aside from developing international standards, one group recommended the development of advisory boards to develop best practices on accountability and feedback. Participants in the workshop noted that actors representing different actors and sectors could create a global standard, vocabulary, and framework around data collaboratives.
A consortium of these actors could build equity into every step of the collaborative process, involving the community, and investing the necessary time and resources to create an equitable process around data and generate equitable outcomes. One participant called upon the model provided by Institutional Review Boards, bodies which authorize human-subject research at universities to mitigate risk.
“Ethical review boards need to be explored further, but we need to take off our biases in regards to the United States and the rest of the world,” noted JoAnn. “We do not want to put people in other countries in jeopardy. Ethics and cultural norms need to be taken into account.”
Multiple groups raised the possibility of a third-party enabler for matching supply and demand. Advisory boards, data stewards, and data trusts can serve as independent groups to review and have a proactive role in data initiatives. Later on, they can help to establish accountability and provide feedback.
“We need an independent, neutral space where we can encrypt and share data.” Said the representative of one group. “We need a place to facilitate data sharing and pair interested parties together, whether that be a physical or digital space.”
As workshop participants from across sectors elaborated on the ecosystem of data actors to facilitate responsible data collaboration, there were some five takeaways from the workshop.
First, groups saw a need for data-demand actors to define good, targeted questions to solve problems while data suppliers needed to know their data and keep privacy and security top-of-mind. Partners need to understand what public problem they intend to address.
Second, the participants grasped that data collaboratives make use of data that represents or is generated by people. Adopting a people-centric approach provides useful insights and keeps ecosystem actors accountable to those they are trying to serve. More research about the use of data for social good can help public and private sector data actors maximize public value while launching meaningful efforts to mitigate risks.
Data for good initiatives might also adopt an end-to-end approach, considering data responsibility at every step of the way. To make a positive impact, data actors should make data-to-knowledge pipelines more action-oriented. This process can use experts and the capacities of data scientists and decision-makers.
Trust is a common barrier to data for social good initiatives. For initiatives to grow and scale, there must be substantial trust between all actors of the data ecosystem. Trust can be established through mechanisms, such as data collaboratives, contracts, and dispute resolution methods.
Creating effective initiatives for social good calls for bridging of the dichotomy of data science and domain knowledge. Data actors from different practices, such as policy and research, and from different sectors, are encouraged to maximize and leverage their unique resources and expertise.
The workshop brought a diverse group of actors together to discuss the opportunities and challenges for addressing challenges through data initiatives. We are excited about the excellent ideas raised by the participants, and we hope that the conversation around establishing a responsible data ecosystem continues. At The GovLab, through the 100 Questions initiative, we are working to harness the collective intelligence through bilinguals, individuals who possess both domain expertise and data science skills, to improve the types of questions asked to transform the way 21st century problems are solved.