(I mean archived websites)

I was recently tasked with setting policy for description of Web collections at the University of Miami Libraries.

This task spawned a great deal of mental masturbation: Web collections force us to re-think our assumptions about the nature of a collection and its use cases. They force us to be more conscientious about description.

In the interest of professional discourse, here’s a re-cap of my (and my colleagues’) thought process re: description of Web collections.

#1. What are we describing?

The collections? The websites? Individual pages or publications on the sites we archive? All of these were suggested as potentially appropriate points of access.

I had two questions: 1) How can we best serve users? 2) How can we align description of Web collections with intellectual control for our other collections?

After informally polling our user-base and professional contacts, we determined that known-item searches (e.g., I want to access an archived version of XYZ site) would be facilitated by the Wayback Machine. We can help users by facilitating exploratory searches and browsing. Okay, so!

  • Subject and genre information support exploratory methods
  • We are collecting around topical themes that lend themselves to subject and genre description
  • We control other collections (archives and digital collections) at the collection-level, with a top-down approach to browsing

Therefore, we shall describe the collections.

#2. Where are folks gonna access this description?

The library catalog? Archives finding aids? ArchiveIt website? WorldCat? ArchiveGrid? A homegrown tool?

My questions, again: What do our users want? And how can we align this with our other collections?

Which led to a third question: Who are users of Web collections?

What’s interesting about users of Web collections is that most of them don’t exist, yet. If I want to access a site that my archives is capturing today, I can still visit that site. The user of the archived version will exist two months, two years, 200 years down the road.

Who are these futuretime users? We know some: Researchers — serious academic types and folks trying to look up content that has disappeared. The original creators of said content (especially if you’re archiving your own institution’s output!) Archivists, librarians.

Where do these futuretimes users want to find stuff? We have no idea. We’re not psychics. The question we can answer is “how can we align description of Web collections with our other collections, so that in the futuretimes, we remember where we put this metadata, and we can convert it more easily?”

The answer, for us, is (gasp!) collection-level MARC records. We describe all of our holding silos (archives, digital collections, special collections, general collections) with records in the library catalog and OCLC Connexion. The one metadata silo I’m sure no one will forget about is the library catalog. It’s our catch-all spot.

So, how do we describe Web collections, and encode these descriptions in the MARC bibliographic format? What content standards can we turn to, and what do we need to innovate?

Choose Your Own OCLC Template Adventure:

I need to code this as a collection and an online resource, under archival control. The “books” workform provided the MARC control fields I needed.

But, it sounds like a CYA situation:


Is that us? We’re capturing these sites, we’re creating the collection. Our relationship to the resource is “collector.”

Another option might be the creators of individual websites — but there are multiples of them, and we’re describing the collection, not the sites. So I settled on:

110: University of Miami. Library, collector.

710: [Individual site creators, if important], contributor.


How do we let users quickly grasp what this collection consists of, and what it is intended for? Does it make sense to devise a title just as one would for any collection?

That would give us something like:

The University of Miami Cuban Theatre Web Collection


The University of Miami Collection of Cuban Theatre Websites

But the University of Miami doesn’t have much to do with these resources; what we’re doing is preserving/stewarding them. Putting our name prominently in the title is misleading, so how about a work-around:

245: Cuban Theatre Web Collection / University of Miami.

Win 1 for the statement of responsibility! (This was the first time I ever appreciated the MARC formats…)


We’re cataloging at the time we start capturing sites, but we intend to keep taking periodic captures. So let’s treat the date like we do for serials:

260/264: [year]-

Since a more precise date is important, let’s include the month and/or day of first capture, and let’s normalize the date according to DACS:

260/264: [year] [month] [day]-

Then, let’s note the frequency of capture. People can do math.

310: [Monthly]


How do we measure Web collections? By number of sites? Pages? Bytes? And how do we account for accruals? Will we periodically update the statement of extent? Cop-out with a plus sign?

Who are the users of the statement of extent — Human “end” users? Machines? Archivists?

Tell you the truth, guys, these are hard questions. I decided to avoid them by going with:

300: 1 Web collection.

…which is consistent with the statement of extent for other materials (e.g., “2 diaries,” “23 linear feet”). But I admit, this statement of extent serves zero use cases, and I’d like it to be more useful. Tell me your thoughts on the statement of extent for Web collections?

Now, let’s talk terminology. When I presented the first draft template for description of Web collections to our curatorial staff, there was much hemming and hawwing over the label, “Web collections.” Will users know what the heck a “Web collection” is? Probably not. So, we decided to go with the Harvard Library’s language, instead:

300: 1 collection of archived websites.


First off, let’s provide an abstract. Abstracts are our friends! They help us understand what this stuff is and why it was preserved.

520: [Abstract]

Next, let’s do some subject indexing, from the usual controlled vocabularies.

6xx: [Subject]

But wait! We usually subdivide subjects for archives collections with the form subdivision “archives.” If we do that for Web collections, will the user think they’re getting old paper records? (I can see you guffawing.) Well, we talked it over, and we decided to go with:

6xx: [Subject] |v Archived websites.

And then, of course, ya know, cos library catalogs LOVE duplication (especially ours):

655: Web sites. (from LCSH)

655: Archived websites. (local term)

… I take no responsibility for the insanity of standard Libraryland indexing practice.

The Collection!

How do I go to there? Link to the ArchiveIt collection landing page:

856: [linky link]

So far, all the other bits of description and encoding seem straightforward (or, shall I say, as crazy as usual).

I’d love to hear your thoughts on describing Web collections (archived websites). E-mail me or Twitter

Written by

Special Collections Librarian, Data Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store