6 min readJun 26, 2017

The IIIF Image Api is a technology that allows smart clients to manipulate, trasform and convert portions of very large images, minimizing the bandwidth required to transfer them over the Internet and making possible to work with images with dimensions ranging in the tens of thousands pixels.

This post shows how to set up an open source IIIF server providing an Image API over an existing corpus of images. This implementation is used in eLife 2.0 for all images served on elifesciences.org, dynamically selecting the resolution that best fits a user’s browser.

Software

IIIF servers are not a commodity as relational databases, and the choice of software is going to constrain other parameters. Writing a server from scratch is not an option as the work that it performs is non-trivial: cutting, resizing, rotating and especially converting images between different formats. Not only there is a very large input space to test to verify image-related software (all the possible images in the world) but it is a cross-cutting problem that spans many different domains.

We choose Loris, an image server providing IIIF Image API Level 2. Loris is a small Python web application, backed by many Python and C libraries. We run Loris in uWSGI, nut any WSGI compliant server would do.

However we expose Loris to the outside world through an Nginx server, which is capable of solving the standard problems of HTTP traffic such as setting cache headers and performing redirects. Nginx can scale to thousands of concurrent connections, and it is unlikely to be a bottleneck on top of the CPU-intensive work of image manipulation.

Storage

Once chosen an IIIF server, we went to check out a few storage solutions for providing it with original, high-resolution images. The trade-off in storage is to use a warmer medium that provides a faster access, or a colder medium that is less performant but provides a better cost per gigabyte.

In the context of our AWS infrastructure, the choice boils down to:

Elastic File System (a glorified NFS), shared external volumes attached to EC2 instances
Elastic Block Storage (a glorified NAS), external volumes attached to a single EC2 instance
S3 as an object storage

EFS (Elastic File System) would be a very bad choice because this is a read-only storage from the IIIF server side, with no interaction between servers.

SSD EBS volumes cost ~$0.10 per GB/month, although some of these costs could be cut by using slower magnetic disks or throughput-optimized if the quantity of data reach some minimum threshold.

S3 costs ~$0.023 per GB/month, to which some API call costs must be added, but no data transfer charge is applied inside the same region (us-east-1). Therefore it is economically very convenient for long-term, append-only data storage like a corpus of images. Moreover, Loris provides a resolver that can cache on the local disk a copy of the original image.

Therefore, we started using S3 as a backend; it provided the IIIF servers with a single source of truth for the reference images and perfectly fit as the lower level of the storage hierarchy, with the local filesystem cache of Loris on top, and ultimately a Content Delivery Network.

Loris will always have read the entire source image onto the local disk before starting to manipulate it. There is however no reason to conform to S3 as a proprietary protocol, so we used one of Loris’s HTTP resolvers to load images through the s3-external-1.amazonaws.com hostname.

Infrastructure

Infrastructure in the real world is often referred to as the fundamental facilities of a country — in the computing world it consists of all the (now virtualized) hardware components that connect the user with what he wants to obtain. In the case of IIIF, the laundry list consists of:

two or more virtual machines: `t2.medium` EC2 instances, not expensive and good for CPU bursts due to peaks of traffic.
Their volumes: not-particularly-fast hard drives to use as a cache of original and generated images. The standard storage provided by EC2 instances is only ~7 GB of which most if occupied by the operating system; therefore additional volumes fit as the second level of storage and avoid having to clean the cache every half an hour.
load balancing: ELBs, from which servers can be detached one at a time without interrupting traffic, in order to perform maintenance or cleaning operations like cache pruning. ELBs can also perform HTTPS termination, making the single IIIF servers easier to set up as they don’t need certificates.
Content Delivery Networks: CloudFront used for edge caching, storing cached versions of popular images near the users location to reduce latency. CloudFront is not a particularly good CDN — it doesn’t protect from cache stampedes, invalidation of content can take time and it takes up to an hour to update its configurations. We stuck with it because of simplicity, and provision all components inside AWS for the time being.

Operations

Once the architecture had been set up, particular care is needed to keep entropy from breaking down the system and ensure continuing availability through the weeks and months that follow the launch. Without safeguards in place, a small problem like disk space filled by a log can degenerate into errors visible to the user and inability to access images at all.

The first monumental task was to schedule a periodical cache cleaning mechanism to retrieve disk space: we tried to perform this on a live server, but there is no correct way to guarantee traffic is not going to be affected; for example, due to the removal of an expired reference image that is currently being used to generate a response. Moreover, detecting the least recently modified or accessed files to implement a LRU policy can be very intensive for the filesystem due to the sheer amount of files and directories being created. Therefore, cache cleaning is now performed while taking the instance being cleaned off the load balancer; and in most cases it is a blunt cleaning that deletes the whole content of the cache folders.

A second aspect is monitoring: we want to be alerted in case a sizable percentage of requests is failing; if the latency for serving generated images becomes too high; if a disk or CPUs are getting to 100% utilization. We set up New Relic on both the application and server side, gaining non-intrusive insight in what is happening to live traffic:

Testing

The last concern, not to be ignored, is to be able to upgrade without fear. We have a testing pipeline that takes a new version of Loris (usually just a commit sha), sets it up on a cluster of four servers and runs the whole corpus through it, requesting a sample resize and the info.json file for each image. It helps to have a template to create a testing environment for your IIIF implementation, in this case parameterizing the number of servers in order to get an high throughput, complete the test run in an acceptable time of ~3 hours and then shutdown the testing environment and its operating expenses.

This test suite was extremely helpful at the start of the IIIF project, when we discovered that about 10% of the original TIFF files were incompatible with the libraries used by Loris, and we introduced a fallback to the equivalent JPEG version. The eLife corpus was about 110 GB and growing at the time, and with this fallback we had to modify only 2 source images to a more standard color space.

Results

After the eLife 2.0 launch, the IIIF API is being used by every single article and page on elifesciences.org to cut and serve the most suitable images and image portions: