Exploring the Microdata Frontier

Emily Shaw
Exploring the Microdata Frontier
3 min readOct 31, 2014

--

As open data advocates, we seek to help the public access the best quality data we can get.

One of the critical dimensions of a dataset’s quality concerns its granularity: the number of individual observations aggregated together in individual cells. Microdata — data which is not aggregated at all, but which is available at the level of the individual observation — is the most granular data.

Why is microdata so powerful? For many of the most social valuable kinds of data use, the more granular the data, the better it is. A useful analogy for understanding the value of high granularity lies in comparing it to high resolution digital photography. Images with more pixels per inch are clearer and easier to interpret; they can be enlarged to look at smaller details, and they offer a more precise representation of the object they depict. More detailed data generally offers a similar set of advantages. Highly granular datasets — in such fields as criminal justice, education, health and social service delivery—allow researchers to test more detailed causal theories and learn about more specific outcomes. For app developers, highly granular datasets enable the creation of more precisely-tailored services, providing greater value to end users. In both cases, microdata allow the highest possible granularity for their respective uses.

Because more granular data enables a broader variety of uses, open data advocates tend to seek access to more data at higher levels of granularity. So if the utility of microdata is so high, why is it often difficult to achieve open access to individual-level datasets?

The answer is that one person’s individual-level observation is another person’s private information. For many decades, U.S. laws have worked to define and defend a right to individual privacy that concerns not just our immediate physical privacy but also the privacy of certain kinds of information. As a result of significant federal laws like the Privacy Act of 1974, the Health Insurance Portability and Accountability Act, and the Family Education Rights and Privacy Act, Americans enjoy the right to prevent the open disclosure of many kinds of data that would allow others to identify and learn personal information about them.

Even beyond these legal restrictions, we have a broad ethical interest in making sure that private individuals are treated with respect as human beings, not just as generators of data. Although it’s not the only way to accomplish this end — and it’s not without some significant downsides — our existing privacy protections are the major way we currently demonstrate our interest in making sure data-sharing does not harm individuals.

The conflict between data users’ desire for individual-level data and our laws and norms of individual privacy protection leads to some critical questions for open data advocates. What exactly are the best current policies and practices for balancing our need for improved open data against the claims of individual privacy?

To help inform this very important conversation, we have begun a more detailed exploration into the current legal, technical and social landscape of 21st Century microdata. Please follow our “Exploring the Microdata Frontier” collection in order to learn along with us — and we hope you’ll provide your own thoughts and comments along the way.

Post originally published on the Sunlight Foundation blog.

--

--