Sustainable Digital Scholarship
Shrinking our Footprint, Broadening our Impact
The following is the text of my keynote speech given at the Digital Frontiers conference at the University of North Texas, Denton, on September 21, 2017. I made some slight edits to the annotations, and included some of the slides for context. Innumerable thanks to Spencer Keralis, Head of UNT-Denton’s Digital Scholarship Program, for inviting me to speak and share space with fellow digital scholars Jacqueline Wernimont (Arizona State University) and Laura Braunstein (Dartmouth College).
Good morning everyone! Thank you Dr. Cowley, Dr. Keralis, and Dr. Barrix-Moore for such a wonderful introduction to the conference. I want to start by thanking the Digital Frontiers organizing committee for inviting me here today. I also want to thank the employees here at the university for their labor that makes the conference in this space possible. Staff at the hotel, staff at UNT, staff who did the cleaning, room set up, the people preparing the food nourishing us, organizing us, and the people who helped us travel to this conference. I see you and am grateful for your labor. It’s a practice of mine to acknowledge the places in which I’ve been asked to speak and engage with those histories. Texas is an interesting location to discuss this idea of exploring edges or pushing boundaries, given the state’s historical genesis of European colonists pushing violently into the borders of Mexico, and onto tribal lands. I’m using as strong and intentional language as I possibly can, because I understand that we as information professionals have played a role over the years in how this history is documented and made accessible to others — through our classification schemas and keywords, our finding aids, and research created and disseminated — and there is no question that language or selective facts to elide or suppress the brutal nature of that frontierism played a role in creating the current climate, hundreds of years later — of fear, of racist scapegoating, and of human rights abuses perpetrated in the name of boundaries and borders.
It is a grimly relevant time for this year’s conference theme of exploring edges and pushing boundaries. The ubiquity of big data has a lot of scientists, researchers and librarians rubbing their hands together as hungrily as Wile E. Coyote spying the Road Runner over the many possibilities for research and discovery of special collections and other library resources. However, there are some current issues within the practice of digital scholarship illustrating that our problem of boundaries is that digital humanities need some in order to move forward in a manner that is ethical, inclusive of and accessible to our user communities and not exploitative of those who perform the labor.
If we’re at the edge, let’s take a moment and a deep breath and pull back.
Language is key to how we conceptualize the idea of a “frontier” for digital humanities. The definition of frontier is a border or line separating two countries. It’s also characterized, by Webster’s as the extreme limit of civilization and wilderness. I would question everyone here to ask what side of the line do you think digital scholarship is on? Is it civilized? Or is it wild? What makes it so? And from whose vantage point? Who is allowed to define the line? And how can we define what are the upper limits of “civilization” in this discipline when so many of us cannot even agree on what digital scholarship is or define it for a broad swath of users — either its standards or its point?
So what do we know about the current state of digital humanities or digital scholarship? We’ve got more tools than ever before to amplify our special collections and scholarly research, open source and proprietary tools. We have mapping tools like ArcGIS and Scalar, or Leaflet, exhibit tools like Omeka or Murkutu, metadata tools like the Oral History Metadata Synthesizer, and narrative tools like Creatavist or Wordpress, documentation tools, corpus-building tools, and content streaming tools. And we have more content and collections than we know what to do with. We have our existing special collections or university archives, community archives, historical societies and government repositories. All are generating collections through regular acquisitions in addition to whatever was backlogged and not yet processed or digitized. We have Big Data generated by our government, our reporting agencies, our cellphones, our social media accounts, our browsers and in some cases, our bodies. Neverending data streams are reporting on everything. And there’s scholarly output as well, including course curricula, theses, dissertations and open access journals. This proliferation has created some compelling digital humanities projects. Todd Presner at UCLA, for instance, used social media data from Twitter to document the Arab Spring uprising as it spread through various countries in the Middle East. The University of Richmond partnered with the University of Maryland, Virginia Tech scholars at Johns Hopkins, and staff at the NARA to create Mapping Inequality, which analyzed historical federal real estate data to document the history of redlining in America, as two compelling examples.
While we have a rich collection of historical data to draw on for a variety of projects, we’re also generating considerable born-digital data, billions of bytes, by the minute and the second. There is a Niagara Falls at the edge of the digital frontier, and we — the digital humanists, librarians, archivists, researchers, developers and content management specialists — are standing at the bottom of this waterfall, with the equivalent of a small plastic cup, hoping to catch it all. Cisco recently estimated in 2016 that global data traffic on the whole (and this includes everything hosted in cloud storage and data centers) is already in the zettabyte era. We’re even gathering data in the act of destruction, as we recently saw with the purposeful push of Cassian into Saturn after it ran out of gas exploring the planet’s moons.
Even if we could, even if the effort didn’t feel so Sisyphean, where would we store it all? Where could all of these collections and all of this data — analog and born digital — go? I would argue that as our works become even more explicitly digital from point of creation to preservation to access point, we’ve created an unsustainable problem for the future of digital scholarship, especially in terms of what we plan to do with it all and the implications of long-term storage.
How many of us in the room have heard the term “storage is cheap?” It’s ok to raise your hands.
But no. Storage isn’t cheap. Everything has a cost. The idea that we could simply create digital storage out of thin air — and the The Cloud does nothing to mitigate that concept — is false. Everything has a cost. Even digital storage. Especially digital storage. For all of these collections and documents and data that we hope to hold. And it’s not just a money cost, but a human cost. Our exploration of boundaries in digital humanities should fall back and look to solutions that plainly and intentionally evince our values, and how well we value the information we’re trying to share by respecting the people and communities at risk of being harmed by our processes.
Historically, the way we’ve explored boundaries has always exacted a devastating human toll; it’s not a surprise that our current expression of digital accumulation as practiced in the academy mirrors our physical acquisition of land over the years. Here in Texas, the bordermaking was marked by Spanish and American colonists fighting over land populated by First Nations tribes from the early 1800s through 1845, when marking a hard border via the annexation of Texas triggered the three-year Mexican-American War. The war left Mexican nationals on the U.S. side of the border separated from their home government and also created a new issue for indigenous communities on both sides of the border that were still fighting for their own sovereignty. After Reconstruction, these fights became the more localized “range wars,” with farmers and ranchers marking territory and expansion through barbed wire and racialized armed conflict.
It would appear that almost 200 years later we haven’t really gone beyond the idea of marking territory, planting flags, trying to build walls — all while telling ourselves that we’re breaking new ground, moving into the future, and securing our legacy. Hundreds of years after Europeans landed on the eastern seaboard of this already populated continent, aspects of our digital gathering have mimicked a colonialist mentality. I recall, for instance, looking back on the ways in which researchers and repositories rushed to take and then donate the various Occupy materials with an eye toward prestige and not necessarily toward the realistic understanding of what long-term preservation and stewardship of those materials looked like. Everyone wanted to be the first, to plant the flag, to “civilize” through the academy the wilderness of protest and raw emotion. In the day to day, we conceptualize our projects, trying to make tallies for end-of-year department reports and strategic plans. We’re doing things. We’re productive. We’re trying new things and in some cases, leaving room for failure and learning, but moving forward. Collecting, creating, and storing.
In order to talk about progress in the discipline of digital scholarship or digital initiatives in libraries, we as information professionals should first situate ourselves within an ethics of caregiving or radical empathy, as conceptualized by archival scholars Michelle Caswell and Marika Cifor. We have to understand what they described as our affective responsibilities to each other, that how we gather and manage information isn’t transactional as much as it’s about relationship building with each other, and with our connection to the materials that we are making accessible. At last year’s DLF Forum, I talked about care ethics as envisioned by political science professor Daniel Engster, who in his article “Rethinking Care Theory,” says care ethics is a gender-neutral moral and political philosophy that not only engages our community through basic acts of caring, but also provides justice to communities through that care.
We care about our communities by 1) helping meet basic needs like food, water and shelter; 2) sustaining people’s capabilities for basic functioning in society such as sense, movement, imagination and reason; and 3) helping people avoid or alleviate pain and suffering. Using a minimalist standard to caregiving, he says, allows the theory to be as broadly applied and performed as possible.
Engster’s theory would suggest that we have to set boundaries in our collection of digital data and and resources because we care. We care about the people whose job it is to organize and store information. We care about the people creating these records, this data, and these historical perspectives. We care about the people being documented. We care about the people being affected by how digital objects are stored. And we care about how that storage affects the sustainability of our natural and built environments. Digital space isn’t infinite. As our works and the tools to create them become more explicitly digital in terms of content, preservation, repositories we’re generating an unsustainable amount of material. So the question is how can we express care ethics in the digital library setting or our digital scholarship work in a way that’s sustainable with respect to our affective responsibilities to each other? How do we express the act of preservation in a way that cares for people?
I’d like to explore some of the challenges we face with regard to storage and stewardship of our collections. The first example, and the reason I am so deeply interested in issues of e-waste, is very personal. I manage a digital scholarship program that includes a physical space for faculty and student collaboration. I also am responsible for managing the technological infrastructure of the library. In the room and the library on the whole, there is considerable hardware and software that needs regular maintenance. Upgrades and obsolescence are a regular part of our daily workflows. The tools we use to create digital scholarship projects generate their own physical digital storage. We have a surplus of older machines: VCRs, DVD players, USB cords, monitors, hard drives, flash drives, zip drives, disks, cassettes, and inexplicably, there was also an electric keyboard. The detritus of decades gone by. There was more than 20 years of stuff in this room. When I opened the storage room door, I could not walk from one end of the floor to another with a clear path.
Because of the overflow in the storage room, we couldn’t order printing supplies in bulk for the digital scholarship center, because there was no room to store the paper and ink, which was costing us thousands of dollars a year. Our storage problem was expensive, and physically unsustainable, as our supplies would occasionally overflow in a visible service area. The printing materials drive production and demand in my department, so I was tasked with figuring out how to responsibly get rid of what we didn’t need.
My instinct for solving the issue was well-intentioned but shot down. “Let’s see what is useable and donate to local public schools or other education programs,” I suggested. As it turns out, the university’s rules around e-waste mean you cannot donate tech hardware or pieces and parts to organizations that do not have their own sustainability plans, less you become responsible for generating landfill trash second-hand.
So what to do to clean this room out, and create space for needed printing materials for our patrons? It turned out to be a much simpler solution of talking and tagging. I had to physically meet with the tech members of my team and student workers to walk them through each item, tagging things with red post-its (which meant clear out stat), blue, which meant keep. Many of the digital scholarship digitization tools, like the multiple VCRs were keepers. And yellow tags, which meant the waste would eventually go, but we would go through the boxes one by one first. The university has a contract with an e-waste company that would responsibly recycle parts. But the biggest lesson was that this was a task that couldn’t be rushed. I had to talk to people. I had to find out about why staff did certain things. And I had to gain their trust. Sometimes, even as we’re navigating the world of digital this tech that, it’s the analog, soft skills that have the most meaning and movement. My e-waste issues could only be resolved by caring enough to have the conversation with staff who felt a personal sense of ownership over certain items, and to respect them enough to talk them through letting it go and scaling back, so that we can all move forward into the future of our department together. So that’s one example of what our storage issues can look like at the micro-level in the digital library or digital scholarship setting, especially when we care enough to slow down and scale down.
The macro issue behind care of our digital storage decisions is the one that affects us all deeply, however, and the one we should be talking about the most. Our obsession with collection and the “throw it in the cloud”/ “storage is cheap” mentality is in danger of destroying our built environment and our communities. Every digital object, every record generates data that has to go somewhere. The “cloud,” while not sitting on the desk in your office, exists in a structure somewhere. Is it your community? Maybe not, or you’d know about it. These server farms take up an incredible amount of space. In this way, the digital frontier has generated its own land rush, reshaping borders by creating data towns, propped up or subsidized by various local governments through eminent domain. In Northern Virginia, for instance, a community originally deeded in 1866 to freed slaves is being threatened by an Amazon subsidiary that wants to build a 38-acre data center. High-voltage towers would be strung through the heart of the community.
Our work, when executed so casually where we speak of just adding another terabyte to our storage plan, says that the collections and data we’re gathering are more important than communities and cultures that existed long before our institutional repositories were a thing. A 2014 report by Greenpeace showed that Amazon Web Service (AWS), a primary vendor for numerous universities’ digital object or special collections storage, was using a considerable amount of fossil fuel energy, or coal to power its data centers. A 2016 article by the Atlantic Monthly noted that while many data centers or cloud computing centers are using renewable energy more frequently, the high demand for the services cancels out whatever good that the renewable energy decisions might have created through increased use of electricity and demand for climate controlled buildings to protect the hardware. the U.S. Department of Energy released a 2016 study estimating that in 2014, data centers in the U.S. collectively consumed 70 billion kilowatts of electricity and used 626 billion gallons of water to cool and dehumidify their facilities. Doubtless this consumption of energy and natural resources has a tangible effect on the environment.(1) We can’t elide climate change in this discussion of storage. Not when two weeks ago, there were three Category 4 hurricanes in the Atlantic Ocean at the same time; not when ash and haze from drought-sparked wildfires recently blanketed Portland, Oregon, and Los Angeles; not when ice shelves the size of Delaware are breaking off of Antarctica due to warming ocean currents.
The implications of our work don’t stop at the effects of digital storage on the natural environment. Data centers’ large ecological footprints tend to be incredibly disruptive to the built environment. Environmental justice advocates note that poor, marginalized communities bear the brunt of this type of industrial pollution, and have done so for decades. If we go back to Engster’s care ethics theory, our process is anti-care. We might be making it harder for people to meet basic needs like food, water and shelter by engaging with unsustainable storage or data collection methods; and 3) we are definitely not helping people avoid or alleviate pain and suffering.
The irony of this abounds because currently there are researchers and activists using big data and digital scholarship tools to seek justice for their communities. Citizen-scientists are text-mining of government reports or corporate data to analyze trends over time in disciplines from hard science to the humanities, using participatory mapping or data visualization tools to conduct their own experiments and document environmental injustice, such as the case with the tainted water in Flint, Michigan. Flint is still more than 1200 days without clean water, by the way.
Librarians, archivists, and other technologists spearheaded the Data Refuge effort, which looked to archive datasets addressing climate change that the Trump administration removed from the EPA and other government websites as a means of intentionally denying the effects of climate change.
We are navigating a complex system as we look to the future of this discipline. We — librarians, archivists, researchers, and scholars — are complicit in negatively impacting the environment and even harming communities with the digital data we’re collecting and preserving, but our collection of that information is also helping people, and more importantly allowing people to help themselves. Even when those same communities may also rely on data or computing models that use or waste the same amounts of electricity as our data storage processes.
And that definitely doesn’t sound like progress, right? Digital librarianship or information management can’t purport to be innovative and future-oriented if we’re playing into such a problematic paradigm. There has to be a better way for us to express how much we value our culture. But what?
What are our solutions if we say that at the edge of the digital wilderness, we are going to retreat, if radical empathy and care for the environment and for other individuals means we take in fewer collections and shrink our cloud storage footprints? What does this mean for preservation? What does it mean for our efforts to make information widely available across the expanse of the Web?
For those of us who work in digital preservation, research data management, or institutional repository management the obvious answer is digital curation, but what does that really mean? How can we express our affective responsibilities to one another through responsible digital curation?
One way is by asking questions, taking the time to talk and tag, as I did on the micro level with my team member, but scale up: What are we collecting and making available in our institutional repositories? How are we making it available? Are we communicating the environmental or ethical implications of this massive data gathering to the faculty or other researchers interested in working with our collections? Have we engaged the community on our collecting activities and the ways in which they might find the information useful to making their worlds better or easier? Global migration professor Alison Mountz, with several other authors, suggests an organic approach,” in a 2015 work For Slow Scholarship: A Feminist Politics of Resistance through Collective Action in the Neoliberal University. Instead of forging through on new ideas, projects and datasets, that we have conversations about these scientific methods, examine the voices included or excluded in this data collection or preservation, and think critically about longer-term environmental and personal implications of engaging with big data and big storage.
What are some models where this is already being done? The post-custodial model of archives already encourages individuals and community organizations to steward their cultural histories outside of academia, reducing the need for large-scale computing within bigger organizations. But what about accessibility? Can we curate according to community or localized needs? If, as agents of our repositories, we could use digital curation to foster working relationships with groups that could use access to our scholarly outputs and digital collections to advocate for policy changes or improvements. For instance, one of the digital scholarship projects we are collaborating with researchers to produce this year is visualizing spatial relationships between sexual assault offenders and survivors. The researchers have more than 20 years of data culled from backlogged rape kits in Cuyahoga County. This data is highly sensitive and has necessary restrictions on access or a need for collection in the aggregate. It is also being housed within the university’s high-performance computing department to help the researchers quickly analyze such a large body of data. It’s necessary to use the computing in order to bring this critical information to communities who could use it immediately to seek justice from the legal system and to help make changes to their environment that may help mitigate sexual assaults within their neighborhoods.
But slow scholarship or community-driven digital curation means that we don’t need to jump to put the entire dataset within our repository immediately. That’s a boundary we don’t need to cross. We can take the time to ask our researchers questions, we can take time to query affected communities about how they would hope to engage with the data and what parts of that data they might want to engage with. We can determine which datasets of all 20 years and counting is most relevant to allowing survivor organizations or other stakeholders to use to advocate for their safety. I’ve asked a considerable amount of questions during these speech, because I don’t necessarily have all of the answers. I look to this gathered body, and other professionals and colleagues to help, so that we might through our individual or collective actions work to mitigate the real harm that our work can create.
Answering those questions will take time. Time is okay. Care is okay. Care is the point. Because when everything is all done, we want people to engage with this material, to use the tools, the link to the research, to share the stories. To use them to care about their worlds and histories and experiences.
Exploring the edges or pushing boundaries may mean that it’s ok to pull back, it’s ok to not collect everything, to be intentional about what we preserve, even more so than we already think that we are. At this moment maybe the biggest exploration is internal. If we pursued preservation or storage at the expense of our communities and our planet, who would be left to share all of that information? And isn’t that the point of it all?
(1) See also “Data Centers and Hidden Water Use,” Drew FitzGerald. Wall Street Journal. June 24, 2015. https://www.wsj.com/articles/SB10007111583511843695404581067903126039290. Accessed September 30, 2017. Paywalled.
Engster D. “Rethinking Care Theory: The Practice of Caring and the Obligation to Care.” Hypatia. 2005. pp. 50–75.
Mountz, A., A. Bonds, B. Mansfield, J. Loyd, J. Hyndman, M. Walton-Roberts, R. Basu, et al. 2015. “For Slow Scholarship: A Feminist Politics of Resistance through Collective Action in the Neoliberal University.” ACME: An International E-Journal for Critical Geographies. 14 (4): 1235–1259.
Mah, A. “Environmental Justice in the Age of Big Data.” Environmental Sociology. Vol. 3, 2016. pp. 122–133.