Zines vs. Google Vision API — Part 1: Process
Everyone likes zines. If you went to library school your probably really love zines. Even if you didn’t go to library school, you still probably like zines. Even Kanye likes zines. However much I like them I don’t really have too much experience with them. While I worked at NYPL I knew about the library zine collection. Other institutions around NYC house large important zine collections as well. But while reading about a large accession to University of Kansas Libraries zine collection from Solidarity! Revolutionary Center and Radical Library I saw they had a number of them already up on Internet Archive.
While browsing them I thought about how complex they are from a digital surrogate point of view. Up there with digitized newspaper, zines are often a combination of text, various fonts, images, orientations, anything imaginable. I wondered what a digital discovery system would look like for a collection of zines. These zines also have very minimal metadata, a title, creator and sometimes a description and subject terms. Simultaneously I’ve been looking at the Google Vision API suite and wondered what commodity computer vision API could do with this corpus. This is not deep learning model building, just very generic methods. But I thought it might be possible that they would be good enough to create some compelling use cases for a Zine discovery system. Plus when you sign up you get a $300 API credit, so…of course, let’s do that.
First step was to download the assets from IA to run them through the API. I grabbed the JPEG2000 assets for each page of the zine, there are ~800 zines with around 13320 images. The API needs the image to be lower than 4mb, so I resized them all to 4MB jpgs. I then pushed them to AWS S3 because you can provide the API a URL or binary data. I rather push them once to a hosted CDN than upload 4MB x 13000 for each time I wanted to send an image through the API.
I picked four of the APIs: Label Detection, OCR, Facial Detection and Image Properties. I wondered if the API could label features like “Cat” or “Automobile” etc. OCR would be a good test to see how flexible the google OCR model is to handle a wide range of fonts/text/orientation. Facial Detection because why not. And properties to get dominate colors found in the zine.
The Document Text Detection API is compelling, it does the same thing as the OCR but it provides more grouping information putting text into paragraphs, etc. But it does cost 2x as much.
I ran them through the API, it took a while, and cost about $18 for each API with a total cost of $73.77 to process the entire collection of 13K images. Meaning I only have $227 of my credit left for more API adventures 😿
The goal is to evaluate the results of the API. To do this I built a React interface that lets me browse the thumbnail images, high res in OpenSeaDragon viewer and API data returned. I made this public if you would like to explore as well.
With this tool I can now do some spot checking and evaluate how successful the API was on the corpus which I will report back on.