Ethical, Safe, and Effective Digital Data Use in Civil Society
By: Lucy Bernholz, Rob Reich, Emma Saunders-Hastings, and Emma Leeds Armstrong
How do we use digital data ethically, safely, and effectively in civil society? We have developed three early principles for consideration:
Default to person-centered consent.
Prioritize privacy and minimum viable data collection.
Plan from the beginning to open (share) your work.
This post provides a synthesis from a one day workshop that informed these principles. It concludes with links to draft guidelines you can use to inform partnerships between data consultants/volunteers and nonprofit organizations.
Data scientists have a wide set of skills at their disposal, including statistical analysis, algorithms, mathematics, machine learning, visualization, API and large dataset management, and experience with many software programs such as R, Hadoop, Java and Python. Nonprofit professionals have a similarly wide-ranging skill set that can include background training in the social sciences, community organizing, or many other fields, on-the-ground experience in psychology, counter-bias work, financial oversight, leadership development, advocacy, constituent and program management. In a vastly oversimplified way, data scientists are good at manipulating and analyzing information and nonprofit professionals are good at analyzing people and manipulating systems. How do we bring these two forces together in ways that ethically respect the different types of expertise, make good use of their complementary contributions, and improve services to the organization’s constituents?
At a conference hosted by the Digital Civil Society Lab at Stanford’s Center for Philanthropy and Civil Society (PACS) in June of 2015, two-dozen data scientists, nonprofit executives, scholars, and philanthropists dove into these questions. The conference aimed to create a checklist that data scientists and nonprofits working together could use to guide their interactions. Our focus was on the burgeoning field of “data intermediaries” — the organizations helping connect data experts to nonprofits and government agencies with the goal of “doing good.”
Two observations made the subject matter timely: First, the number of organizations offering these opportunities is growing, as are the ways in which they are going about it. Second, from the 2014 Ethics of Data conference, it was clear that several useful primers and tools for implementing the use of digital data in humanitarian and disaster settings are coming online, and there are still no clearly stated “terms of engagement” for how these two sectors can best work together.
The group quickly identified numerous shared moments of conflict between nonprofit organizational “hosts” and data science volunteers (or consultants). We discovered that there is a finite universe of process maps that guided the intermediaries’ work, that common ethical dilemmas could be “pinned” to each point on these maps, and we generated a working list of strategies for addressing these dilemmas. The draft checklists attached at the end of this post was drawn up following the meeting, and is intended as a working prototype for the participants and others in the field. Our goal is to put the checklist forward as a starting resource and have people use it and improve it over time.
How the Work Gets Done — Process Maps
Not surprisingly, different intermediaries interact with nonprofits in slightly different ways, although each engagement has several common steps. Overall, the group generated three different process maps, categorized by whether the work was initiated by the data intermediary, (DI), by the nonprofit, (NPO), or by some form of open challenge methodology (Challenge). Here, in brief, are the common stages of engagement across the three types.
· Identify problem for which data sets are accessible and ecosystem of organizations exists
· Identify lead NPO partner, introduce problem, gauge interest (repeat until partnership is formed)
· DI and NPO craft “solvable” and “useful” data inquiries from the data sets
· DI build models and with NPO test, iterate, refine
· DI and NPO apply models to datasets, gather insights, share model and results
· Project concludes
· Identify data science intermediary, discuss organizational and programmatic goals, agree to discovery period
· NPO shares internal data with DI, DI assesses it for quality, determine validity and broader insights
· DI and NPO craft “solvable” and “useful” data inquiries from available data
· DI and NPO clean data, seek other information, redefine solvable problems
· DI build models and with NPO test, iterate, refine
· DI and NPO apply models to datasets, gather insights, share results
· Sustainability and maintenance of models revisited and data collection practices discussed for future use
· Project concludes
· DI identifies and recruits data scientists
· DI identifies and recruits nonprofits
· DI works with NPO to craft problem statement, challenge amount, and available data
· DI hosts problem statements
· Data scientists submit models and approaches to addressing the problem
· NPO and DI review submissions
· NPO selects challenge winner, use insights, rewards data scientists
· Project concludes
The most important variation in these models revolves around the quality and breadth of data available, the potential implications of the work for communities beyond the individual nonprofits, the degree of constituent input, and the emphasis placed on nonprofit capacity to carry the work forward. In general, DI and Challenge initiated projects seemed to address the first two characteristics (reach and generalizability) while nonprofit-initiated projects tend to focus on the latter two (capacity and sustainability). All four issues were significant challenges — and decision-points — for all participants.
In addition to these project-specific maps, the participants identified and briefly discussed an “ecosystem” view of the relationships between nonprofits, funders, data experts and communities. A representation of this ecosystem and the points at which all constituents might benefit from discussion and negotiation about the tradeoff is shown in Figure One. The checkmarks in Figure One show who is usually tasked with addressing issues that should be the responsibility of all parties.
What are the ethical tensions and where do they reveal themselves?
The most frequently identified points of tension, requiring compromise or balancing acts are listed below and clustered by the stakeholders (assumed) most likely to raise the issue:
· Data that may compromise individual privacy versus utility of aggregate data set.
· Transparency in the use of digital data. How can stakeholders interrogate the process?
· Emphasis on data collection and use versus community data literacy.
· Multi-party, sequential data sharing and its implications for ownership, consent and constituent privacy.
· Balancing expertise required to use data with broad participation in decision-making.
· Creating data consent processes that balance time and costs with the “persistence” and “reach” of the consent; including data, algorithmic use, and downstream models or information uses.
· The use of “invisible decision making spaces” (algorithms) in community and participatory processes.
· Emphasis on model building versus sustained use and programmatic change
Data intermediary/data scientists
· Project delivery versus education about data bias, model maintenance, and general data practices.
· Who owns the models and insights that are developed from the projects?
· Collecting and sharing of personally identifiable information knowing data security limitations.
· Cost benefit of one-off projects without investments in sustaining capacity
This discussion also identified a number of stakeholders who had not been represented on the process maps. Figure Two shows a more complete set of stakeholders and some of the motivations that would inform both ethical choices and perceptions of success.
Many of the issues that these partnerships raise are familiar from other domains of work. Protecting the privacy of personally identifiable information, for example, is a responsibility felt by organizations of all kinds. We must place the lessons learned from this working group into the specific context of civil society as a source for the voluntary use of private resources toward public benefit. This allows us to derive specific recommendations and framing assumptions for the practical engagements between data intermediaries and nonprofits. These principles may well extend to other forms of digital data use in civil society.
Civil society is where we voluntarily use our private resources for public benefit. It is particularly important that participation in civil society activities is guided by opt-in defaults. To make this real, the consent practices that nonprofits and foundation use throughout their work should emphasize individual choice, control, and ability to withdraw. Digital data collected from people is a form of private resource contribution, and the consent processes should focus on the person as the agent in control, not the organization or team collecting and using the data.
Using digital data ethically in civil society begins with robust, human-centered consent practices that extend across the lifecycle of data and algorithmic use.
Our second principle draws from the reality of nonprofit resources and the widespread challenges of data security practices. across industry, government, and nonprofits make it clear that information stored on connected devices is vulnerable to attack. Generally speaking, nonprofits are likely to be the least well resourced when it comes to investing in advanced data security measures, and so a valid starting point is to assume system vulnerability. To achieve their goal of consensual participation and individual agency, civil society organizations should minimize the risk to data contributors by collecting and storing as little (re)identifiable information as possible.
The second principle is a default to “minimum viable data” practices civil society organizations and their data partners will both reduce the risk to data participants and push themselves to constantly weigh the risks and benefits of their data methods, always giving primacy to individual privacy protection.
General observation of data security practices across industry, government, and nonprofits make it clear that information stored on connected devices is vulnerable to attack. Generally speaking, nonprofits are likely to be the least well resourced when it comes to investing in advanced data security measures, and so a valid starting point is to assume system vulnerability. To achieve their goal of consensual participation and individual agency, civil society organizations should minimize the risk to data contributors by collecting and storing as little (re)identifiable information as possible.
By defaulting to “minimum viable data” practices civil society organizations and their data partners will both reduce the risk to data participants and push themselves to constantly weigh the risks and benefits of their data methods, always giving primacy to individual privacy protection.
Our third principle derives from the first two as well. Nonprofits should seek to share what they learn and what they do as part of serving their public purpose.
If you are preparing for open sharing from the beginning of a project you should take extra care to get permission (consent) from the people involved and protect (minimum viable collection) their data as best as possible.
Attending carefully to consent and protection enables you to openly share the information in an ethical fashion. You will have sought permission to do so, and done everything you possible to prevent harm.
These three values — consent, minimum viable data collection, and open sharing- comprise a basic framework for ethical, safe, and effective use of digital data by civil society organizations. They should be integrated into partnerships with data intermediaries and, perhaps, into general data practices in civil society.
We developed two tools to guide conversations between data volunteers and/or consultants and nonprofits. These are downloadable below. Please use them, share them, improve them, and share them again.
DRAFT — ITERATE AND IMPROVE — and share share share
 See the Responsible Data Forum’s Primer.