Data, Desert Islands, and Digital Dark Ages: Richard Marciano on Records and Data Management

Erin O'Rourke
5 min readNov 18, 2019

--

On November 1, Dr. Richard Marciano, a professor at the University of Maryland, asked Sawyer Seminar participants, “If you were on an academic desert island, what data would you bring with you?” After hearing about his career, which included working as a computational environmental scientist and at a supercomputing center, studying electrical engineering, and most recently, working as a professor and director of data curation initiatives at UMD, it was clear that Dr. Marciano has had to make decisions like this one numerous times. He discussed moving between jobs or even universities and bringing relevant data sets and sources with him into these roles. Consequently, he lends a fascinating perspective to data curation and records management, as well as pedagogy in these fields.

Dr. Marciano first came to UMD when they were seeking professors to transform their Masters in Library and Information Science program and change the way students were trained in digital and computational methods. To balance the fact that he comes from a science background, he intentionally built teams with members from archival and library backgrounds. One of the courses he introduced was an eight-week intensive program across disciplines that uses digital methods to work through data problems. In teaching, he uses tools like Jupyter notebooks to create readable, touchable, interactive environments and learning spaces that others can build upon. In addition, he suggests universities create certificate programs for continuing education in digital methods for humanities and archival professions to keep up with current trends. For example, major curators of data like the National Archives are moving away from analog records. In the next year, all new records submitted to the archives from federal agencies will have to be digital. However, if this is to be the case, there need to be frameworks in place to use and organize these digital records: without them, we end up in what Marciano called the “digital dark ages,” in which it’s easier to view and understand maps created during the Civil War than ones created during the wars in Afghanistan and Iraq. When the former is analog, and the latter is digital, in the short run, more effort needs to be put into maintaining long-term usability of digital records, due to rapidly changing technologies and lack of backward compatibility.

I encountered this “digital dark age” phenomenon myself while working in the archives of my hometown as a summer job last year: many town committees kept important documents from almost a hundred years ago on paper, but much of the information from the late 1990s and early 2000s was stored on diskettes and even floppy disks, and it was not converted when these formats became obsolete. As part of my work, I recovered files from the diskettes but was unable to do so with the floppies, because it would require expensive hardware that we couldn’t obtain on our grant-restricted budget. I can only imagine the concerns this would cause when the data being lost are not merely the musings of one town’s public safety committee, but records of national importance, or those that tell the stories of underrepresented groups, as is often the case when the teams curating records are not appropriately diverse and well-trained.

Dr. Marciano went on to discuss a case study that serves as an example of how computational thinking can be incorporated into records management. One paper he provided as suggested reading for participants discussed a computational model to mark restricted personally identifiable information in records from Japanese internment programs in the US during World War II. The process involved coming up with a flowchart explaining under what circumstances a person’s data would be restricted, depending on whether the individual was a minor when the incident occurred and how many years ago it took place. From there, the flowchart could be translated into pseudocode, then Python scripts. This script could be applied to data collected from index cards which were parsed using optical character recognition, then run through a program to extract the relevant metadata. This process supplements the work of researchers and makes more collections available as appropriate to the public. With this same information, Marciano said that one could use records to make a human again out of the data. While many of these individuals’ stories have been lost, reconstructing them from data provides an alternative to complete erasure.

A recurring theme in Marciano’s work is the datafication of people, or, conversely, the “peoplefication” of data. For example, one of his eight-week intensive programs was working with death and cemetery records from African American communities. From these records, one can create “portraits” of the people included in the data. This allows researchers to recover stories and voices that they otherwise would be unable to identify. However, there remain some people who were intentionally left out, and concerns remain as to whether recreating people’s stories from data about them is appropriate. And of course, some gaps remain when people are seen through the lens of records, and these absences, while sometimes unintentional, still reflect major gaps in the teams doing records management. One notable example Dr. Marciano mentioned was discovered by Dr. Lyneise Williams of UNC-Chapel Hill, who worked with a magazine that published photos of Black subjects in her research. When the magazine issues were converted to microfilm, the images of subjects’ faces showed up only as flat black squares. To add insult to injury, the originals were discarded once the photos were digitized, resulting in the literal erasure of a whole group of people. Clearly, curating and managing data is a significant responsibility: it determines whose stories are told, how their importance is weighted, who uses them and for what.

Marciano also spoke about combining data from multiple sources, none of which alone would be considered “big data,” but when combined could be used to gather insights across collections. Often, archives are not designed to cultivate this cross-collection thinking, so Marciano advocates restructuring them into what he refers to as a data observatory, in which one can borrow patterns of thought from computation to organize them into levels. When gathering data from many unique sources, the sources often use different methods of curation, and some even lack metadata. Dr. Marciano’s work involves using technologies like optical character recognition and digital mapping to generate this metadata so researchers can begin classifying records.

Through techniques like these, researchers are given more power to derive meaning from data and use them to tell stories about history and the present. Dr. Marciano’s talk instilled in listeners how important this role is, and as more records begin to be created in digital format, we must take care to ensure the people handling them have access to powerful tools used today, and use them wisely.

--

--