One year, 300,000 images, ten questions
Next year the Paul Mellon Centre will hit the big 5–0, and as part of our anniversary celebrations we are digitising the entire photographic archive. That’s right, all… well approximately 150,000 photographs. Of course by digitisation standards this number is not extreme. The Natural History Museum has set a target of 80 million specimens, albeit with no deadline in mind. But now that we are six months into our project we thought it was a good time to share what we’ve learnt so far.
You may be thinking why do you have a photographic archive and what are your photos of? Back in 1964 there was the Paul Mellon Foundation — its director employed an in-house photographer to provide a photographic service for scholars. When the PMC was established in 1979 the archive and its photographer were transferred. The focus was on making sure photos were taken of artworks in the salerooms, temporary exhibitions and private collections that might disappear from public view. So if there’s an artwork you’ve never been able to find a photo of — you may find it in the PMC photographic archive.
It will be much easier to find what you are looking for once we’ve digitised it all! To make this happen as quickly as possible we are working with Picturae, a professional digitisation service based in the Netherlands, though our collection has been sent to their studio in New Jersey, USA.
We’re not the only ones digitising our photo archive: we are part of a consortium of 14 other organisations with similar collections called PHAROS that are working towards creating a digital research platform to provide aggregated access to photo archive images and data. Picturae has been appointed to undertake the digitisation of several other PHAROS collections, including those held at the Yale Center for British Art, Frick Art Reference Library, Courtauld Institute of Art, and RKD.
Anyway, enough with the background… here are the questions you need to think about before undertaking a mass digitisation project in a relatively short amount of time.
1. How flat and light are the objects you want to digitise?
Getting your equipment ready and set up is time consuming. Unlike the NHM, our collection is relatively simple to digitise. We are able to complete this project in one year in part due to the fact that we are digitising flat photographs that are generally no larger than A4 in size, some stuck to mounts with handwritten text, in folders clearly labeled. Consider the following:
- What are your objects made from? How will they respond to flash? How old are they? How fragile are they? Are they ok to be moved?
- How large are your objects? Will they need to be photographed in parts? What will those parts be defined as? Will you require specialist equipment?
- Make a list of exceptions with rules for the out of house digitisation team. For example — they find a folded photograph — should they unfold it? By putting this list of rules together you will optimise workflows.
We opted to capture the collection in its entirety, meaning that we will photograph the front and back of every photograph and every folder. This means we’ll have almost 150,000 images of blank pieces of card, but we can work out what to do with those later with the assurance that we haven’t missed anything in the process.
2. Are you happy to pack your valuables off to off site studios?
Let’s face it unless you are a major museum or gallery you won’t have the space to digitise a large volume of images in house in one year. Finding a digitisation partner that can meet your expected standard of care is complicated: you need to ensure objects will be packed, shipped, stored, insured, handled and digitised with the same high standards you would apply were you undertaking the work yourself.
We started by asking an independent archives consultant to take a look at our collection. She came on site to assess its value and reported:
“There are images of paintings and works of art held in private collections, clubs, livery companies or institutions not generally open to the public, images from the auction houses and from art dealers of of paintings being offered for sale which might disappear from view. In some cases there are multiple images of a single work; here it may be possible to see alterations to the work itself where it has been cut down, restored, touched up or reframed. […] [It is] an historic research collection of outstanding importance. […] Works that have disappeared into private collections and are not available through other sources can often be found here.”
Wow! We knew it was a special collection, but it was great to have that external corroboration, which helped us to justify the project (and the hundreds of hours we are going to spend on it) to ourselves and our management team.
3. Have you got your privacy shields sorted?
We had anticipated that the collection would be going just a short ferry ride over the North Sea to the Netherlands for digitisation, but as our start date slipped later than expected, it made more sense to send the collection to Picturae’s studio in New Jersey. The team there were very well acquainted with the type of material in our collection as they have been working on a similar job for the Frick Art Reference Library, so the time needed to become familiar with the materials and to establish workflows was dramatically reduced. However, this really complicated the transfer of custodianship.
Our collection documents not just works of art, but also of the people that owned them. Our mounts were updated with notes about the exhibition, publication, and sales history of works of art, which in some instances includes the names of buyers and sellers. As the GDPR legislation came in to effect in May 2018, new restrictions on how “personal data” could be transferred outside of the European Economic Area (EEA) and processed by contractors was put in place, meaning that we needed to ensure that our digitisation partner recognised and respected this. We found that an existing legal gateway, the EU-U.S. Privacy Shield Framework could be used to enable the safe and judicious transfer of this data between the EEA and the US. Picturae came to the rescue here and made sure their certification came through before shipment to their studio in New Jersey, via a long journey across the North Atlantic!
4. What will be your methods of communication?
A company across the water has your precious valuables. How are you going to check in with them that it’s all still ok and that it’s not being damaged? Weekly Skype calls and shared Google Drive files are a good option.
Picturae put together a weekly report that details the rate of imaging and identifies any issues that have popped up. This helps us keep track of progress, and for the team at the New Jersey studio to learn about how to deal with the idiosyncrasies of the collection as they appear.
Additionally, shared reports in Google Sheets have enabled us to visualise our progress in charts and tables, using Google Data Studio to create a dashboard of widgets using that live data.
5. What about the cataloguing?
Cataloguing and digitisation go hand-in-hand. It might be that you can use transcription tools to pick out all the relevant metadata.
We don’t have any additional staff resource for this project, so we’re trying to automate as much as possible by unlocking the valuable data that was catalogued in an analogue format by the Photo Librarian(s) of the past. As our collection has a hierarchical structure of boxes, folders, and prints, we started by transcribing the box and folder labels in to a structured format. The idea is that the prints will then inherit all of that data, including information such as the artist’s name, dates, subject terms, and institutional context. This *should* mean that each photo’s basic record will be partially complete from the off.
We’re also considering how reliable OCR might be with this type of material. The Fondazione Giorgio Cini have been experimenting with page segmentation and OCR of historical documents, with very promising results that we hope to glean from.
6. Who’s got the copyright?
Ahhh… copyright. The artwork has copyright, the photograph of the artwork has copyright, and the digitised image has copyright. There are consultants out there that can help you clarify these things, and we very much recommend that you do.
7. Have you got a spreadsheet and willing colleagues to help you check that many images?
Even though we are outsourcing the digitisation of our photo archive we are still checking every single image that comes back to us.
The cropping and rotation is done automatically by Picturae’s own software, so it is important each file gets a human error check. Together we developed guidelines for quality control that help to outline what is expected from either side of the process. The guidelines define what we each perceive as quality (the file formats, image specifications, standards, etc) and explain how quality will be achieved, maintained, and improved throughout the process.
Surprisingly, we have found the most significant issues to be with the original objects. Glare, misalignment, odd cropping, degradation of early digital prints, and the inclusion of the odd 10p paperweight have complicated the review process, particularly as we don’t have the original document to compare.
We haven’t found too many errors with the software — the main ones have been images rotated incorrectly or cropping errors.
8. Now you’ve got all these files where are you going to put them?
Again there are companies out there that can help you with this, but do make sure to investigate prior to digitisation as they can be pretty expensive, so it’s wise to factor this into your budgets early on. Digital preservation pops up in many fundraising applications.
9. And how are you going to find them?
In a world of mass digitisation the last thing you want is to not be able to find what you want. We’re planning to catalogue the objects using a combination of approaches, as outlined previously. Finding these resources is going to be dependent on publishing our material in interoperable standards. We’re planning to published our data in machine-readable XML conforming to the LIDO schema, and in RDF conforming to the linked.art recipe, which is built on CIDOC-CRM.
10. What is the end goal?
For us IIIF is going to be extremely important. IIIF is a series of conventions that allow for the exchange of images and their associated metadata using very neat software developed by other people. Great tools like Mirador and the Universal Viewer offer a super rich feature set that will enable our images to be turned, zoomed, layered, annotated, and compared. Even better, as long as we serve our images in this IIIF compliant format they can be dropped in to any compatible viewing tool, allowing our users to choose what best suits their needs.
Ultimately we want the photo archive to be accessible to all, easy to search, compare and share.
This article should have taken you nine minutes to read. In that time Picturae will have created 33 scans.
(As you might have noticed we’ve set up an entirely new blog here on Medium as well, so this won’t be the last of us gifting our knowledge to you all. If you’d like to know more about any digital projects PMC has done please let us know and we can craft a post especially for you.)