Week 2, 01/18/17
Data as archival records: literature review, citations, archival research

Topics:

Creating a literature review, managing references, citation formats, archival research, meta data

Readings for this Class:

Notes on Readings:

Gitelman, “Raw Data” Is an Oxymoron, Ch 1

Similar to last week’s reading by Johanna Drucker, discussing a humanist’s more constructivist approach to data, Gitelman also questions the concept that data are ‘given’ versus constructed and interpreted, subject to bias and interpretation. As the title of the book suggests, Gitelman argues that data are always already ‘cooked’ and never ‘raw’.

The general perception of data is that they come ‘before the fact’. They are the starting point for what we ‘know’ and how we communicate. However, unlike a photograph that is generally understood to be framed by a photographer’s choice of composition and point of view, we don’t consider the frame for data as something that consists of arbitrary choices by the individuals who collect and present the data.

Gitelman presents three general precepts about data.

(1) Data are abstract

Data require material expression to retain and manipulate the abstraction.

(2) Data are aggregate

The second general precept is that data are aggregate. “They pile up.” The concept of data are so aggregative that English usage increasingly makes them into one ‘mass noun’.

“…the plural (of datum) data is not universal, not generalizable from the singular; it is an aggregation”

(3) Data are mobilized graphically

“in order to be used as part of an explanation or as a basis for argument, data typically require graphical representation and often involve a cascade of representations.”

Loukissas, “Taking Big Data Apart: Local Readings of Composite Media Collections.”

In this paper Loukissas argues that we should be asking: “Where do Big Data come from and how do the local conditions of their creation shape subsequent research and practice?”

Loukissas uses local as a relative term. In the example of our solar system, the earth is local; in the galaxy, the solar system is local; and in the universe, the galaxy is local. In this sense we can understand that Big Data is a collection of local data. In order to think critically about Big Data we must be able to take them apart.

Loukissas makes an interesting observation about data visualization’s complicity in agglomerating data and obscuring the heterogeneity of data before they are processed.

“In recent years, the popularity of data visualization has normalized both data and ways of looking at them. Local readings of data operate as a critical counterpoint to data visualization.”

Looking for the local

According to Loukissas, there is no such thing as universal data. Data are situated within local conditions. Ways of inscribing data are always constrained by the technology that exists at that point in history. Data are shaped by regional classifications which are a product of geographical location. Data is also shape by what is left out. Loukissas uses the example of searching the Smithsonian catalogue for the terms ‘black’ and ‘white’.

“The museum diligently documents work created by artists who identify racially as ‘black’. But the search term ‘white’ brings up little about race, other than the occasional piece linked to white supremacy. White, the racial identity of the vast majority of artists whose work is shown at the Smithsonian, is not critically examined by the institution.”

This absence reveals a bias.

One of the risk in the exponential increase of data is a decline in the rigor of cataloging records to access data, which can lessen the quality of library data.

“From this perspective, the movement toward Big Data can paradoxically make information less accessible.”

Local conditions in Big Data

I know from my own experience with creating a centralized platform for non-profit member organizations in Rhode Island that the “promise to reveal new patterns across previously independent datasets” is an incredible challenge due to the differences in each organization’s motivations and goals. The differences between individual organization’s data and culture become undeniably apparent when you begin to make efforts to collectively normalize them.

The list below outlines Loukissas’s speculative guidelines for adopting a local sensibility when working with (or against) Big Data:

(1) Get dirty. Ask what might be lost by filtering out local features in data? Data that initially appear out of place might offer insights into past audiences or lost provenances.

(2) Be comparative. Ask how are data created otherwise? The local dimensions of data are most evident in the context of Big Data, when different datasets are brought into productive dialog.

(3) Attend to context. Ask what are the environments in which data evolve? Illuminate the infrastructure of data collection and interpretation.

(4) Stay connected. Ask who lives with data and understands their importance? Data should facilitate communication among their authors and users, not serve as proxies for local knowledge.

(5) Collect counter-data. Convey the incompleteness of data. Moreover, invite and integrate annotation, challenges to interpretation, and alternate representations.

Notes from class:

Validity: a central criterion of how appropriate a particular measure is

  • bias = systemic error (data looks correct but contains bias that is systemic)
    selection bias = people who want to participate vs random people
  • external validity = more general definition
  • internal validity = constructs are well defined
  • construct validity = are you actually measuring what you claim to measure?

Causality and Association

  • direction of causality => does a cause b or does b cause a
    - causality = a causes b
    - correlation = a and b often happen together
  • hidden variable = c is left out of the explanation even though it affects the outcome
  • chance: http://www.tylervigen.com/spurious-correlations

Reliability

  • repeatability, similar results

Citations

Citations by scientists are different than citations by historians

For this class use:

--

--