This week will be my last at the New York Public Library. I wanted to reflect a little on my time here through the lens of the work I was fortunate enough to be part of while it is still fresh in my mind.
I started at NYPL Labs May 2013 to work with Trevor Thornton to build a new Archives portal, a facet of a larger Polonsky Foundation digitization grant. In graduate school I saw tweets from the labs account and knew about them in my periphery. I also visited the group while in school with Cristina Pattuelli to discuss possible intersections with our work on the Linked Jazz Project. It was serendipitous that this new developer position became available right as I was finishing up my graduate degrees. I even had my final interview with the team the same day I deposited my master’s thesis.
The team at the time was Ben Vershbow, David Riordan, Mauricio Giraldo, Brian Foo, Paul Beaudoin and Trevor. I worked for a year and half on the Archives project which went from a directory of PDF finding aids to a fully indexed public portal and corresponding backend data management system in a pre-Archivespace world. I focused on the front-end display of the system and made some choices that presented the finding aid as a whole document in the browser as opposed to the traditional approach of a single page for each component. I even had the latitude to propose some experimental interactions with the data/interface.
I feel this project epitomized agile development, two domain specific developers (MLS degrees) working together and meeting every two weeks with archivists and the stakeholder Bill Stingone. We quickly finished up a MVP and shipped the portal in under a year.
In my “spare time” (basically after work) I started getting interested in the library data ecosystem and thinking about it in a holistic perspective. I began trying to get copies of the ILS data dump and experiment with it, for example thinking of the catalog as a subject network. I found working with larger and larger library metadata datasets fascinating.
The list could surely go on, and there is nothing more wonderful than the catalogue, an instrument of wondrous hypotyposis.- Umberto Eco
In June 2014 Labs was reorganized to included the existing developer team, and added the existing Digital Imaging Unit, Metadata Service Unit and Rights and Permissions groups into a larger org headed by Ben Vershbow and Josh Hadro as deputy director over these new Labs units. In addition a new Public Programs & Outreach unit was added managed by Shana Kimball. The new org made sense, the digital lifecycle, from digitization and description to discovery/use was now contained in a single group. In addition, to pursue a new library strategic priority to make our collections more discoverable I moved over to Josh’s org as Head of Semantic Applications and Data Research.
Now colleagues with Shawn Averkamp, we started hatching plans with Ben and Josh’s support to implement linked data methodologies to facilitate discovery. This was not linked data for linked data sake. It was the application of emerging practices to ameliorate historic local library problems. The major problem were many separate datasets of bibliographic data describing part or a facet of the same resource. For example a resource might be described in archives data as EAD, have its digital surrogates described in MODS in another system and have a stub MARC record in our catalog. Aggregating those systems to get the full picture of a resource could provided a single URI for each resources with the complete data. We called the project the NYPL Registry:
Enrichment was also a major opportunity. Using VIAF to normalize names across the systems could provided a single Agent URI for a number of literal labels. We could also use OCLC Classify to introduce LCC call numbers to our data, a system that NYPL did not historically use. Local classification was also an area we wanted to better control via LOD, so we launched a tool to help expose our legacy classmark systems as URIs. The backend system was built out, which became quite complex, as expected. Paul joined us to build out it even more which resulted in a prototype:
http://data.nypl.org/, for example the Agent page for Virginia Woolf.
2017 update: The servers running the application were taken down by NYPL, so the above links will not work any longer. Here is a video demo:
The project had a lot of potential for future uses cases which Shawn and I presented to the larger community.
After more institutional change in early 2016, effort was shifted to build a new research OPAC. I worked as Product Manager with Ara Kim, Shawn, Paul, Brian, Mauricio, Willa Armstrong, Kevin Friedman, Edwin Guzman, Jobin Thomas & others. A lot of the knowledge used to research the Registry project was repurposed. One of the most important being the data model Shawn developed that now used both PCDM and Bibframe components since we were more focused on predicates necessary for fulfillment. We shipped an Alpha version of the site in three months and the project is ongoing.
To be honest, I’m not sure! I will always want to work in the cultural heritage sector, especially in libraries, focusing on data and LOD specifically. For now I will be working on my own projects and on the lookout for new opportunities. If you will be at Code4Lib next week and see me wandering around please say Hi. Likewise if you would like to reach out to to me in NYC or online please drop me a line.