Why do we trust researchers with sensitive data?

Published in

Exploring the Microdata Frontier

6 min readFeb 24, 2015

(Photo credit: UCL Institute of Education)

As Sunlight explores the public use of individual-level data we are also exploring the practical, legal and ethical challenges of working with individual-level data. One big reason why individual-level data is often prevented from being shared publicly is because of a need to protect individual privacy.

At the same time we investigate open data sharing — meaning data that anyone can download and analyze anytime, anywhere — we recognize that it can also be useful to consider how the public can benefit from data shared privately, not openly. Though average citizens cannot access a number of data sets maintained by government because of privacy concerns, they do nonetheless benefit from the work of researchers—both inside and outside of government—who have been able to share that data with each other. For instance, many states have created integrated justice information systems to allow law enforcement, courts and corrections to coordinate more effectively. States like Washington and Oregon have also used interagency data sharing programs to evaluate the effectiveness of programs, like re-entry programs for prisoners and rehabilitation for minors with mental health concerns. While these datasets cannot be legally shared in the open, they can be shared privately in valuable ways.

Fears of harm stemming from the release of identified data has led to many legal restrictions on publishing it openly. As we explored earlier, the 1970s brought a range of new personal legal protections and restrictions on sharing individuals’ personally identifiable information (PII). Major privacy laws — such as HIPAA and FERPA — prevent health, education and financial data collection initiatives from sharing PII-containing data sets in most contexts.

In other words, because of our desire to protect individual privacy, some datasets are not only not “open,” but they are in fact very “closed” and made available only under very specific circumstances. Under typical privacy law, data users must meet high standards and establish relationships with the data providers to create credibility and trust. Privacy laws have disclosure exceptions in place for research purposes, especially the use of data for research that will contribute to general knowledge.

While this might ordinarily seem like an achievable hurdle for any public-interest data user, the past 50 years of developments in norms and law to protect human subjects of medical and behavioral research have created a substantial legal architecture to define and limit what we mean when we talk about the research use of protected data. This standard for acceptable institutional use of protected data is high enough that it prevents many unaffiliated or casual researchers from accessing information; on the other hand, the high standard also allows governments to feel comfortable releasing data to institutions that live up to it.

The Birth of IRBs

Just as the birth of federal privacy law occurred in response to fears of data misuse, the emergence of research review processes occurred as people learned more about major ethical violations in medical research. In the mid-20th Century, the Nuremberg Trials of Nazi war criminals included testimony about grotesque experiments conducted by doctors on concentration camp internees. Dr. Henry Beecher, an American anesthesiologist who served in World War II, identified a number of studies conducted in the United States that he felt also breached standards of ethics and exposed them through talks and a publication in the New England Journal of Medicine of an article on “Ethics and Research.” Spurred by Beecher’s work, the National Institutes of Health and the U.S. Public Health Service (PHS) announced in 1966 increased ethical review processes for the work they sponsored: The Surgeon General’s Directives on Human Experimentation required that all new or ongoing work would need to be approved by “a committee of [the investigator’s] institutional associates [to] assure an independent determination: (1) of the rights and welfare of the individual or individuals involved, (2) of the appropriateness of the methods used to secure informed consent, and (3) of the risks and potential medical benefits of the investigation.”

Unfortunately, the PHS apparently failed to use this directive to adequately vet its existing projects. A PHS study on the impact of untreated syphilis, conducted for over 40 years on a group of African-American test subjects in Tuskegee, Ala., who were falsely told they were receiving treatment, remained in place until a whistleblower helped the Washington Star and New York Times expose the study in 1972. The national response to this discovery led to congressional action. Through the 1974 National Research Act, Congress created a commission to determine the best practices for the protection of human subjects.

From this point on, the protection of research subjects from risk and informing them fully about the nature of the studies that included them were mandatory in federally-funded research. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research produced the Belmont Report, which required all research institutions receiving federal funding to develop Institutional Review Boards (IRBs) to evaluate and oversee all research related to human subjects. The commission articulated principles for IRBs which centered around protecting the people who were the subject of experiment and analysis, and led to a focus on reducing their risk as much as possible, including the strict protection of their privacy and confidentiality.

Federal adoption of these recommendations means that IRBs play an essential role in protecting individuals’ PII in behavioral and medical research performed across the nation. Federal funding supports around 60 percent of research performed at universities and around 30 percent of all research conducted nationally. The broad significance of federal sources for funding mean that even research which is conducted without direct federal funding is still strongly affected by the norm of IRB oversight. All major research-conducting academic institutions have set up their own IRBs. For institutions without their own IRB, third party commercial IRBs can be approved by federal agencies to satisfy IRB requirements.

Further Privacy Precautions in Research

In addition to federal guidelines, some states create their own authorities to regulate research methodology and access to sensitive information. California’s Committee on the Protection of Human Subjects (CPHS), a division of the Office of Statewide Health Planning and Development, conducts reviews subject to the state’s Information Practices Act before approving the use of personally identifiable data from state agencies for research. If the research is conducted using data from a state department, the agency will sometimes have its own regulatory board. For instance, the California Department of Corrections and Rehabilitation’s Research Advisory Committee must approve all research using inmate data independently of the state’s CPHS per its own guidelines.

This extensive network of laws and oversight means that researchers end up taking the confidentiality of their data very seriously. Some kinds of data are required to be maintained in compliance with FISMA standards, which in many cases involves a substantial hardware investment. Data security is a major element in IRB regulation and plans submitted by researchers must include a description of how they plan to physically secure the data, including through choosing appropriate hardware, locks and rooms where data will be kept and limiting the number of people who will be permitted direct access to the data. The guiding notion is to prevent release of PII wherever possible, which means that even IRB-approved data is stripped of major identifiers before being given to researchers if it’s possible to do so.

In many research contexts, the establishment of personal relationships also becomes an important intervening factor on the way to accessing sensitive data. First, many research grants are funded to support interorganizational and interdisciplinary cooperation. Creating the relationships necessary to engage in joint research is a social process that depends on mutual trust and confidence. Whether or not it is a primary factor, it seems evident from the researchers that we’ve surveyed in our work that developing a sense of mutual respect and comfort improves the likelihood that researchers and data-holding institutions will be able to develop good ways to work together, finding a productive balance between the researcher’s project objectives and the data-holder’s need for security.

Originally published at sunlightfoundation.com on February 9, 2015.

Why do we trust researchers with sensitive data?

The Birth of IRBs

Further Privacy Precautions in Research

Written by Emily Shaw