Illustration by Alysheia Shaw-Dansby

Is Privacy at Odds with Racial Equity? Visualizing Implications for Communities of Color in Public Data Releases

Data@Urban
Urban Institute
Published in
7 min readOct 26, 2023

--

Publicly releasing data disaggregated by race or other demographic characteristics presents a double-edged sword in terms of equity and privacy. On one hand, it’s harder to protect the privacy of people of color and other vulnerable populations because they’re often underrepresented in the data, making them easier to reidentify. They are also especially vulnerable to risks that come with reidentification, such as employment and housing discrimination, perpetuation of socioeconomic inequalities, and even threats to personal safety. On the other hand, traditional techniques to enhance privacy in released data, like removing high-risk records from releases or aggregating to higher levels of detail, often obscure the presence of these populations, excluding them from the well-documented benefits of publicly available data, such as government funding allocations.

Traditionally, privacy guidance is highly technical. Smaller organizations may not have access to privacy-specific expertise or the staffing resources to dedicate to this issue, meaning they incidentally create reidentification risks, especially for communities of color, when attempting to fill data gaps.

To address this gap in knowledge and resources, our team created an educational prototype tool that allows community-based organizations and other interested users to explore the trade-offs for equity and privacy when these entities release data and statistics publicly. Readers interested in the structure and content of the tool itself are encouraged to explore the prototype, which we built in Tableau. This post explores how we used user experience (UX) and user-centered design principles to ensure the tool allows organization staff and decisionmakers to promote, target, and seek funding for their services while making informed decisions with communities of color in mind as they determine what data to release, obscure, protect, and share.

Privacy and vaccine equity

Public health policy, particularly the field of vaccine equity, provided a complex case study for the intersection of privacy and racial equity during the COVID-19 pandemic. Differences in exposure and disproportionate access to quality health care partially resulted in communities of color being at higher risk of exposure to COVID-19 and having less access to vaccines and boosters. Community-based organization efforts to counter the misleading narrative that “vaccine hesitancy,” and not disparate access, were driving COVID-19 vaccination disparities for communities of color, required the collection and dissemination of robust race-disaggregated data analysis.

The release of race-disaggregated health data comes with its own significant privacy challenges. Federal guidance, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA), establishes protections for sensitive patient health information from being disclosed without the patient’s consent or knowledge. Although HIPAA protects people’s specific medical records, federal regulations to protect people’s data when they are merged or aggregated are much more unclear and do not state precisely how to prevent disclosure risk. Many public health–oriented organizations are aware of the importance of preventing reidentification and are hypervigilant of protecting their clients, but the interest in publishing race-disaggregated data to document disparities in access could still lead to a significant reidentification risk for communities of color who are underrepresented in a dataset.

Community and user input in designing privacy

Following recommended principles for community-engaged methods and participatory quantitative methods, our team included community-based organization input and ideas in the creation and development of the educational prototype tool. Community collaboration has the added benefit of validating the tool with real potential end users, leveraging the benefits of the UX design interview model. Following the framework and best practices outlined in “Increasing the Rigor of Quantitative Research with Participatory and Community-Engaged Methods,” we sought to include end users in the process of designing and developing our tool.

To identify potential collaborators in vaccine equity, we sought a collaboration with our Urban Institute colleagues involved with the Partners for Vaccine Equity (P4VE) program. Our P4VE team colleagues connected us with six community-based organizations that provide health-related services directly to community members, particularly those focusing on closing the vaccine access gap for communities of color.

The first set of engagements with our partners included an exploratory set of interviews. These interviews focused on questions about their current data staff capacity, past and present data collection efforts, goals for using or releasing data, and existing data privacy knowledge. All partners were offered a gift card as a small way to thank them for taking the time to share their expertise.

Embedding community collaborator input in prototype design

Although staff from community-based organizations generally value the potential benefits of publicly sharing their data and respect the privacy concerns of their clients, they are unfamiliar with statistical methodology for protecting people’s privacy and the associated trade-offs for increasing data access and equity. We learned there was substantial interest among organization staff in becoming more fluent in the methodology. They were also willing to engage with an educational tool in service of that goal, provided such a tool would be easy to use. Many organization staff expressed plans or goals to expand their data collection efforts. All these points validated the creation of the prototype tool to enable education and adoption of data privacy best practices when the organization may lack the in-house expertise to do so.

To incorporate community collaborator feedback directly into our tool, we created wireframes (sets of images displaying the planned functionality of the tool) with three primary goals in mind. We include samples from these wireframes to illustrate how they addressed each goal.

1. Defining technical terminology

The tool prototype should function as a data privacy translator between technical and nontechnical audiences. As such, one of our primary goals in the design process was to avoid overwhelming users with unapproachable terminology. To do so, we provided definitions and explanation ahead of defining new concepts and terms.

Defining Privacy-Preserving Techniques before Applying Them to a User’s Analysis

2. Centering real applications with fictional examples

For this pilot, we decided to create a fictional dataset to serve as the basis of the tool and to avoid immediate privacy concerns associated with using confidential data. Although we did not intend for this dataset to be representative of real populations, we wanted to ensure the dataset would be realistic and enable examples of relevant analyses. To do so, we included questions in the user interviews about the structure and type of data collected by the organizations. We used 2021 Public Use Microdata Sample files for Washington, DC, as the foundation for our fictional demographic information, and generated additional fields for service provision based on the user interview responses about their data. We then created examples of analyses that community-based organizations could conduct using this data and walked through the privacy and equity trade-offs of releasing the analysis results.

Presenting the User with Options for Equity-Oriented Analysis of the Fictional Dataset

3. Demonstrating privacy risk beyond personally identifiable information

User interviews made it clear that staff understood that removing personally identifiable information (PII) — or information that directly identifies a person, like their name, social security number, addresses, and so on — was necessary to protect privacy. We wanted to ensure users understood that removing PII is necessary but is insufficient when attempting to protect privacy. To do so, we included a separate step in the tool that detailed the remaining risks staff needed to be aware of, demonstrating that removing PII did not eliminate all privacy risks.

Discussing Outstanding Risks after Deidentifying the Fictional Dataset

Iterative designing with community collaborators

To complete the design process, we reengaged with four of our initial six community collaborators to share our progress on the tool and to test assumptions made during the tool build.

The community-engaged user testing interviews included conducting a live walkthrough of the tool and requesting real-time feedback, which indicated that the content in the tool was relevant and at an appropriate technical level for our intended audience. Participants also expressed that the general structure of the tool was legible, so we didn’t need to make any foundational design changes based on their feedback. We did take away a few insights to translate to small design optimizations, such as modifying part of the navigation to allow users to cycle through multiple analysis scenarios more easily.

Future work in exploring the intersection of privacy and equity

We hope to continue to collect feedback on the utility of this prototype for understanding privacy trade-offs in the context of equity-oriented data releases. If you have feedback to share or notice errors in the prototype, please email DataPrivacyTool@urban.org.

This work poses two potential avenues for next steps: improving the existing prototype tool and continuing to investigate community-engaged methods as they apply more broadly to equity in data privacy. Improvements to the tool prototype, based on the feedback from our user testing, should largely center around allowing for additional customization, so users can more accurately understand the privacy and equity implications of their specific use case. Ideally, users could upload their own datasets, or customize the combination of analysis and privacy steps they apply.

A broader investigation of community-engaged methods for privacy and equity would use data privacy as a test topic for better understanding the needs of community-based organizations and how that work intersects with racial equity–oriented data solutions. This continuing work will allow us to advance the original goal of the prototype tool: to protect and represent the privacy of communities of color in data-related decisions.

-Maddie Pickens

-Deena Tamaroff

-Sonia Torres Rodriguez

Want to learn more? Sign up for the Data@Urban newsletter.

--

--

Data@Urban
Urban Institute

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.