What makes data legitimate?

Learning from organization legitimacy theories

ys
trialnerr0r
3 min readMar 21, 2024

--

Edited by me; Sources: X & X

How it all started

Once researchers have the data, they usually clean, analyze, and hopefully publish the results derived from the data somewhere. We clean the data, but mostly to get rid of noise for analysis. Normally, we don’t clean it because we question the legitimacy of the data.

Data can be sourced from various different channels, system logs, online scrapping, or APIs. So why do we just take data as it is? Why do we treat it as objective materials and rarely thing about the politics of data? Essentially, why makes data legitimate?

I read about the problems the other day [1]. Ever since then I cannot stop thinking about them. Unsure about the answer. I do what every Ph.D. student does. I start with a small round of literature review*.

Context of literature review

After typing “legitimacy of data” and various combinations of those two words, I have repeatedly been served with similar batches of papers. Besides danah boyd’s paper that proposes the question [1], none of the results really come from relevant domains such as information science.

Most of the results come the field of organization legitimacy theory. Albeit not directly related to information science, I still reading about how others define “legitimacy” still has its merit. Understanding of legitimacy (even from a different context) would at least provide me some ground and lens to examine the legitimacy in the context of data.

What is legitimacy?

Within the realm of organization legitimacy theory, one of the oldest definitions proposed by Suchman back in1995, later got expanded in more modern context, defines legitimacy as “a generalized perception or assumption that the actions of an entity are desirable, proper, or appropriate, within some socially constructed system of norms, values, beliefs, and definitions” [2].

After reviewing several iterations of the above definitions, Deephouse and Suchmann offered the following definition: “Organizational legitimacy is the perceived appropriateness of an organization to a social system in terms of rules, values, norms, and definitions” [3]. For a more detailed record of how the definitions change throughout the year, please refer to the full paper [3].

Like most things in HCI, legitimacy is contextual. Legitimacy is how appropriate things are based on sets of rules, values, norms, and definitions.

So maybe now the question is, why is data appropriate? What makes data appropriate?

What makes data inherently legitimate, or say appropriate?

This analysis has drawn out two major transductive pathways in which these data practices are made legitimate — standardization and objectification — and I lend further analytical weight to these pathways below [4].

Then the next question is why is it

Data is the new oil (or at least that’s the common saying nowadays). We don’t really question oil because it’s a natural resources. But is data really equivalent to oil in terms of the fact they can be taken inherently legitimate?

Footnotes

*Since this is just for a personal project, I did not follow the PRIZMA literature review protocol. I relied mostly on google scholar search engine instead of exhuast relevant ACM and InfoSci libraries.

Citations

[1] boyd, d. (2020). Questioning the legitimacy of data. Information Services & Use, 40(3), 259–272.

[2] Suchman, M. C. (1995). Managing legitimacy: Strategic and institutional approaches. Academy of management review, 20(3), 571–610.

[3] Deephouse, D. L., & Suchman, M. (2008). Legitimacy in organizational institutionalism. The Sage handbook of organizational institutionalism, 49, 77.

[4] Wilson, M. W. (2011). Data matter (s): legitimacy, coding, and qualifications-of-life. Environment and Planning D: Society and Space, 29(5), 857–872.

--

--