Brian E. Davis
9 min readJul 6, 2017

Digital Preservation at OSULP: Why Brian?

I’ve spent the last 20+ years working in the digitization and digital preservation field. With digital preservation being a large part of my responsibilities in two of my previous positions in large academic research libraries, I began applying digital preservation actions to files being produced by the Digital Production Unit soon after I began at OSULP in 2012. However, digital preservation was not seriously discussed by the library until 2015. As the primary staff member actively doing digital preservation work, I was asked to lead the digital preservation work assigned to the Digital Collections Planning Group, a group that I also chaired. Our current Digital Preservation Policy remains very similar from the draft version that I wrote in the summer of 2015.

Although I was forced to step back from my leadership role with digital preservation due to classification issues with my position in 2018, I kept up with my day-to-day digital preservation activities. It’s my belief that key processes from the Levels of Digital Preservation are important enough to keep doing. My current digital preservation activities are highlighted in the DPU@SCARC: Current Digital Preservation section below.

Digital preservation work that I’ve done during my time at OSULP includes building a description service for digital files that creates PREMIS-compliant METS metadata files, building and testing Archivematica — an open-source digital preservation platform, and configuring a ZFS storage appliance with built-in file fixity and data integrity safeguards.

I also developed a series of digital preservation scripts. These scripts utilize an assortment of open-source command line tools to apply specific actions to digital files and/or directories of files using the macOS services menu. I’ve shared these publicly through my GitHub page and many working in digital preservation have downloaded and used them over the last couple years. In a recent LFA presentation, I demonstrated how these scripts have become invaluable for me as I remote into DPU workstations from home to do core digital preservation work after hours and throughout the weekend.

Although these two instances are less about the work I do for OSULP and more service-oriented, they do illustrate the scope of my technical skills when it comes to digital preservation and how those skills are viewed outside of the OSULP bubble. As part of the Digital Preservation Working Group for the Orbis Cascade Alliance, I helped build the Digital Preservation Step By Step tool. My section was File Fixity & Data Integrity, and I assisted with other sections. In June of 2017, I led two hands-on digital preservation webinar/workshops for the alliance. Nationally, I’ve been part of a few different NDSA working groups that worked to revise the Levels of Digital Preservation in 2019 (my name appears on page 10 as one of the contributors). I’m currently on the Standards and Best Practices Interest Group.

I’ve been doing bit-level digital preservation and digitization work for many years and I’m often contacted by other universities to consult or provide guidance as they look to improve this work at their institutions. Some of my consultations in recent years include: Cornell University (AV digitization), Duke University (digitization), Arizona State University (digital preservation), University of Arizona (digitization), University of Oregon (digitization), University of Idaho (digital preservation), Oregon Historical Foundation (digitization and digital preservation), and Kansas State University (digitization).

DPU@SCARC: Current Digital Preservation

Although the term preservation-level digitization is reserved for cases where digitization is a preservation strategy for the object(s) due to active degradation or to issues related to obsolescence, the workflows for all the digitization that we do in the DPU are generally the same. That means we use well-maintained and calibrated equipment. I calibrate our scanners with a ColorChecker target and SilverFast. Our secondary displays are IPS LEDs, which means that they offer consistent and wide-angle color accuracy. The displays are calibrated with X-Rite's i1Studio calibration and profiling system. ColorSync is used to share those calibration profiles with our scanning software, thereby ensuring color accuracy. Full-spectrum lighting is utilized to soft proof our reflective materials scanning. With all of our digitization, I try to adhere to the FADGI Technical Guidelines for Digitizing Cultural Heritage Materials.

Current Digital Preservation

PREMIS view of Photoshop edit history

My digital preservation related work actually begins before any object has been scanned and blurs the lines between digitization quality control and digital preservation. According to PREMIS, change history for an object should be recorded as Event information. Unfortunately this leaves out the entire life of the object during quality control processes and the file movement stage, which is the period when intentional, and accidental, changes are more likely to occur. Aside from the typical cropping and de-skewing, a substantial edit history puts a red flag on the object in terms of digital preservation and needs to be documented. I save all editing actions to an embedded history file to ensure the integrity of the files. I have been successful in pulling this embedded edit history out of the files with Archivematica and other PREMIS-aware digital preservation tools.

XML templates in Adobe Bridge

One goal of digital preservation is to save as much scanning session information as possible. This, however, has to be done in a way that does not impede digital production workflows. For still image materials, I put together an XML template that embeds capture-related info into the image file. This information can be extracted with a variety of tools. The information being written to the file includes the following:

ImageProducer (student technician)

HostComputer (model identifier)

WorkType (AAT Source)

Mimetype (file type)

With the volume of materials that DPU works with, it did not make sense to apply the templates file by file. Appending the template to a directory of files through a batch process allowed us to maintain our digital production throughput without getting bogged down with item-level preservation metadata.

Temporary folder-level fixity

Files are moved frequently throughout the quality control process. Initially created on a student workstation, files are moved onto a temporary storage server and then onto the supervisor’s workstation for review and processing. Moving files is precisely when the chance of corruption and read/write errors are the highest. Files are run through a checksum tool to create temporary folder-level fixity lists that can be easily verified during this process.

DPF Manager

DPF Manager is an open source conformance checker for TIFF files designed to help archivists and digital content producers ensure that files are fit for long term preservation. DPF Manager can parse file directories and determine that files are valid and that digitization specifications have been followed. Verifying this information in a batch process eliminates the extra steps of manually checking with Photoshop or Bridge.

File name verification

After verifying the files, the filenames are checked to ensure that numbering was consistent and that leading zeros have been used for accurate sorting.

Visual review in Photoshop

It is at this point that a quick visual review of the file is done to ensure that the cropping and orientation is consistent and that the appropriate color settings have been used.

veraPDF

For text-based materials I batch process individual image files into PDF/A-1b files and verify conformity with veraPDF.

BagIt is a widely used packaging format developed as part of the National Digital Information Infrastructure and Preservation Program to accurately transfer digital content across networks and file systems. The BagIt specification is organized around the notion of a "bag". A bag is a named file system directory that contains a "data" directory that includes the digital content being preserved, along with a fixity manifest file. BagIt can examine the manifest file to make sure that the files are present and that their checksums are correct. This allows for accidentally removed or corrupted files to be identified. I use verified bags for final AIP transfers to preservation storage.

BagIt.py via macOS Services script

Rather than using the decidedly clunky JAVA version of BagIt, I wrapped key BagIt Python commands inside of shell scripts that can be executed via macOS right-clicks. This way many files/folders can be bagged and verified with a simple right-click. Those are shared on GitHub.

Videotape Materials

vrecord

With most videotape there is a substantial level of degradation actively occurring and we will be relying upon this newly created digital surrogate as the representation of this artifact as time goes on and the source further degrades. All magnetic media materials require preservation-level digitization. At this stage, it is digitization as a preservation strategy for the materials.

The digitization process for videotape materials is complex when compared to traditional scanning of visual materials. There are a number of playback, correction, and signal processing components that need to be included as part of the digital production metadata. I keep a spreadsheet that contains information on both the physical tape, the transfer process, and the resulting digital file. Things like condition of source tape, the specific tape stock, generation of the source (original or copy), and risk-assessment ranking are documented.

AV digitization spreadsheet

As with most digital objects, the generation and verification of checksums can aid the confirmation (or denial) of digital authenticity over time. A mismatch is an alert that a file has changed from a prior state; potentially triggering retrieval of backups, review of hardware, or migration of content. It is for this reason that I do a file-level checksum on each video file. I chose to run a SHA256 checksums over MD5 due to the output hash value of 256 bits. Because this is digitization as preservation, the high level of detail and added security assurance of SHA256 is certainly warranted. To speed up the process, I have installed a shell script with the checksum terminal command as a macOS Service. That means that I can right-click on a file to run the checksum rather than keying in the terminal commands manually.

macOS Services Menu

Much of PREMIS relates to the individual bitstreams contained within a file. With video files, there are individual tracks for the video stream, one or more audio streams, and a timecode stream that all have frame-level values that can be parsed. Since these individual essence tracks can all incur some type of error separate from the larger file, I run smaller frame-level checksums for each video through FFmpeg, an open-source encoding library. Framemd5 creates MD5 hash values for each audio and video stream in each frame. There are 29.97 frames per second with NTSC video - lots of places for errors and flipped bits to hide. By producing checksums on a more granular level it is more feasible to assess the extent or pinpoint the location of the digital change in the event of a checksum mismatch. Framecrc creates a single transmission checksum for the file and allows me to verify that each field of video is received correctly.

PBCore AppleScript

MediaInfo supplies technical and tag information about a video or audio file. As with FFmpeg and the checksums, I have this process installed as a macOS Service. I also generate a PBCore XML document.

Due to inconsistencies with file creation dates on our preservation storage server, DFXML output is generated to accurately record file creation and modification dates, among other data. This XML is included as part of the AIP.

Portion of DFXML output

I use a local ZFS filesystem for temporary production-level storage that provides routine block-level fixity checks and self-healing for damaged files. Although initially used exclusively for video files, I have setup the local instance of Archivematica to use this filesystem as both a staging area for materials transferring into Archivematica and as the AIPStore.

ZFS Filesystem (Ubuntu)

Brian’s recent digital preservation experience:

Unlisted
Brian E. Davis

📼 🎧 #AVPreservation #Digitization #DigitalPreservation