Teaching Intensive Web Archiving at the University of Wisconsin-Madison

Bertram Lyons
On Archivy
3 min readApr 9, 2015

--

Last summer (2014), I had the privilege of leading a one-week intensive course on web and social media archiving for the SLIS at the University of Wisconsin — Madison.

What interests me most in this topic is that websites and social media are inherently digital. To archive web and social media requires a basic understanding of digital information writ large (e.g., formats, processes, and information structures). In order to collect content from social media websites and applications it is imperative that archivists understand the information architectures that support their use and development as well as the data systems that store and manage the underlying information. Websites and social media are also inherently about change and about time. There is a temporal element to the preservation of such content. We can look at the changing nature of such content in the way that we might document the changing landscape of a street corner and the surrounding businesses or buildings. Documenting the corner today may yield a different result than it will tomorrow, or next week, or next year. A primary consideration, then, is the frequency of collecting, which requires planning and a clear effort to document the context/the provenance of each capture.

In the class, I made a concerted effort to teach skills — command line scripting, working with web crawlers, quantifying data at the bit and file levels, understanding WARC as a file format, using and understandings APIs, creating checksums by hand, and using packaging tools to create checksums in batch and to perform automated verification — and to combine those skills with coverage of general digital preservation issues, especially those specifically related to the acquisition and preservation of web and social media data. I coupled selected readings with in-person discussions, and students worked alone and in small groups to complete daily projects that introduce technical skills and reinforce intellectual concepts.

Being an archivist, I also thought it was important to establish an ontological framework for this intensive. Since this was based in a university setting, we used the SAA Guidelines for College and University Archives as a reference point for core archival functions and expectations. Such guidelines and institutional frameworks separate the work of archives from that of academic digital humanities, records management, software development, or commerce, all areas where web and social media data are of great interest. Using core archival functions as guideposts, we worked through the process of developing a comprehensive archival approach to the activity of collecting web and social media content.

Thanks to Jefferson Bailey, I was able to peruse an aggregation of over 50 digital preservation syllabi from the past 5 years in preparation for the reading list. I culled the lists down to what I thought were the most important digital preservation resources and the most salient web archiving and social media archiving publications or presentations. I’m including a link here to a download of the entire syllabus, for those who have interest: http://bit.ly/1sp7m0K.

The course is about to start up again this summer. I’m looking forward to improving it and reporting back soon!

--

--

Bertram Lyons
On Archivy

archivist | memphian, richmondite, new orleanian, brooklynite, lawrencian, luverdense, washingtonian, madisonian | avpreserve.com