Progressive Enhancement, Digital Objects and the Exploded Viewer

Published in

digirati-ch

21 min readJan 31, 2019

Progressive Enhancement is a good practice. When we try to apply it to digitised objects of cultural heritage, we run into some interesting problems. These in turn force us to think about what we want web addresses — the URLs of our content — to represent. In March my team will be joining The Royal Society at a workshop to consider content, user experience and technical next steps for a pilot project we (Digirati) have been developing. Here, I’ll explore a way for a collections site to present complex digital objects, which we can explore further in the workshop.

On the web, it is good practice to provide something that everyone can see or read, regardless of their browser capabilities, the device they are using, or the connection speed available to them.

Progressive Enhancement is one way of doing this. Deliver simple HTML that pretty much any browser will be capable of rendering, and then layer a more sophisticated experience on top of that through client-side JavaScript and CSS. Whether you are a user with JavaScript turned off, or a search engine scraping the text of the page, there’s something for you.

Graceful Degradation attempts the same outcome from the other direction: build your ideal UI, taking full advantage of the state of the art, but then carefully ensure that you don’t break older browsers.

Although they may sometimes appear to amount to the same thing, Progressive Enhancement is generally preferred. It’s quicker at producing working software that real users can try out. It’s a better fit for iterative development and testing. Graceful Degradation can descend into browser compatibility hell as you start trying to untangle complex features from their dependencies on specific browser capabilities.

Both of these approaches are a long way from the single page web app, where the payload delivered to the browser is mostly application code rather than a structured HTML page. The application reads in data and more resources to build the user experience. As the user interacts with the application, it loads further content via JavaScript and modifies the user’s view. This approach has benefits too, and not just for making slick interfaces. Many of the global population of web users have very capable web browsers on their smart phones or tablets, but poor network infrastructure. JavaScript-heavy applications can squeeze every last bit of potential out of limited bandwidth, but must take care that the initial load of the JavaScript code itself is not a burden. If done carefully, a JavaScript single page app can be both more efficient and a better user experience.

Search engines don’t really like web applications that generate different views through client-side JavaScript. While they can sometimes be crawled successfully by indexers, developers must work hard to support the addressability of that content: providing unique URLs for different views, allowing users to save and share sensible bookmarks, and delivering meaningful page titles and descriptions. If your single page app offers many paths to new content by loading it in directly in response to user actions, search engines may got lost, have nothing to follow, or be unable to create a meaningful index entry for new content loaded this way — you’re still on the web page you started on!

Sometimes this doesn’t matter. It may not be important to surface deep links into your application for search engines to provide to their users. You may just want to bring people to the front door, rather than direct to a particular room.

But sometimes, it really does matter. URLs are important, for accessibility, findability, and the significance of your content as distinct resources on the web, accessed through a browser, through a stable address. These resource concerns can be a deal-breaker even if you don’t care how your content appears to older or unconventional web browsers and non-visual clients such as screen readers (although, you should care).

Decisions about application style are often obvious. Wikipedia is not a good candidate for a single page web app. But a web site that offers a summary dashboard view of trending news stories might be.

Books, manuscripts and other compound objects

These concerns come to a head when it comes to the user experience of digitised content like paintings, maps, books and manuscripts. And it gets more complicated for complex digital objects: those with many distinct views, such as a book with many pages, each with potentially more content, like a text transcript of the page. These objects are most commonly experienced through JavaScript-powered viewers, like these:

A selection of JavaScript-powered viewers all showing the same item

Applications like the Universal Viewer deliver many and varied rich viewing experiences. You can get an overview of a complex object through thumbnails and structural navigation. Sometimes you can read the text, or even search within it. Some are interactive and allow annotation, sharing, embedding and more besides. Some will even plaster the pages of your object over the walls of a virtual gallery, in which you can wander from room to room, chapter to chapter.

The great advantage of JavaScript viewers is that you can invoke that user experience anywhere. They are portable powerhouses of application functionality, to be embedded in articles, blog posts and discovery platforms alike.

Now consider viewers as the means by which an organisation exposes its digitised books, manuscripts and archives. The discovery platform, or catalogue. Typically a viewer application lives on a web page about the single object the viewer is showing. Maybe it’s the library catalogue page for a book and the viewer lets you read it right there by exploring the pages. It’s rare to have an entire collection behind a single URL: individual objects usually get their own web pages . But once inside an object you are navigating from page to page, view to view in an experience driven by a JavaScript application — an island application, autonomous and self-contained on one web page. It has most of the traits of a single page web application, it’s just that there’s one single page application per object.

If the viewer has an API, it can communicate with the external page to notify it of events as the user navigates around. It can update the browser’s address bar to facilitate deep links into content. It can participate in more complex user experiences, built around the viewer component. It can show the text of each page alongside the image view. All this is very compelling, and viewers like those above are the de facto user experience of many of the world’s cultural heritage collections.

Exhibition Catalogue from The Royal Academy Summer Exhibition: A Chronicle, 1769–2018

But what about progressive enhancement?

This doesn’t happen a lot for digital objects today, unfortunately. What do I get with no JavaScript? Usually you’ll still get the containing page with its catalogue metadata. If you’re lucky, the space reserved for the viewer will give you some alternative content, although that might just be a placeholder image of the first view.

If we wanted to support more of our potential users, through progressive enhancement of simple HTML content, how would we go about it? Self-contained JavaScript viewers need to work anywhere, you just have to feed them one data URL to start from. If you are in control of both ends, the client and the server — if it’s your discovery platform — you don’t have this constraint.

We could render an HTML-only presentation of the object. The viewer’s job on the page is to render navigation around the different views (e.g., book pages), and a large enough image of each view to be useful. We can try to do this with regular HTML.

But we run into a problem of scale.

If we only have HTML, we need to generate a lot of it, and the bigger the book, the more HTML we’ll need. More pages mean more thumbnails, more structural navigation, and more of those large images. Without client-side logic that allows us to ask only for what’s needed, we’re going to have to build the whole book in HTML — and if it has hundreds of pages, that means hundreds of repeated blocks of HTML. That might not be so bad, but we also need hundreds of fairly large images of each book page — they need to be big enough to see properly. We’re throwing an awful lot of content at our baseline, non-enhanced user, and they are the ones we’re supposed to be helping!

Our library catalogue web page that uses only HTML ends up with a titanic page weight (the total of the all the file sizes of all the content the page has to load) if the job of this one page is to present a large book.

At this point the JavaScript approach looks a lot friendlier again, especially if we take our bandwidth optimisation to extremes. Consider this tiny viewer. If you look at the source, this viewer is only 1.6 KB with no dependencies. That’s absolutely tiny, and takes milliseconds to load:

http://tomcrane.github.io/wellcome-today/viewer-min.html

This viewer is an experiment in how-small-can-you-get rather than a serious candidate for a viewing experience. But adding a few more KB of code and CSS to it would make it prettier and even more efficient (e.g., only loading thumbnails visible in the left panel and deferring the others until they are scrolled into view). A much bigger jump in capability (but still no larger in its contribution to the page weight than a typical medium sized JPEG) would be to add deep-zoom support, and/or the ability to tailor the sizes of the requested images to the user’s screen. It’s then better for mobile users, much faster to load, much friendlier to low-bandwidth users (deep zoom is also a great bandwidth conserver, allowing hi-res access to images of enormous size by supplying only those image tiles required to populate the viewport).

If we dispensed with the progressive enhancement and just went straight to JavaScript, we’re saving our users a huge amount of unnecessary bandwidth for the HTML version. Even a viewer as feature rich as the Universal Viewer has a smaller overall initial page weight for a single work than a plain HTML version, past a certain number of pages, and that number of pages isn’t very high. What’s more, a JavaScript viewer only needs to be loaded once — the browser caches that code, so the next item you view is only asking for the new data.

A simplistic approach to progressive enhancement means our HTML version of a large object is just too big, and will result in a worse user experience for all users (because of the initial page load). The experience will likely be especially bad for those we’re trying to help most by adopting the practice!

So why not break it up into separate pages, one for each view?

HTML-only viewing experiences (aka, a web site)

So how did people solve this before JavaScript viewers became the norm? Sometimes, especially with archives digitised a while ago, the web site is delivered in plain HTML:

A manuscript page of Jane Austen’s Persuasion

As we would expect from simple HTML, this page of Persuasion is a first-class citizen of the web. For me, it’s the second hit in Google, and there’s the text I searched for, as a snippet in the result:

Google search result with a snippet of Austen’s text

That is, I was able to find a search result specifically for this page of Persuasion (distinct from the manuscript it belongs to). It’s very findable. To explore the issue a little more, the first hit in Google for this same query is also a distinct web address for a page at the British Library that contains, as plain HTML, the text of the entire 33 pages of the item; images of these pages are launched in a separate viewer, in which I need to navigate to for the deep zoom experience. But I can’t get a search result that leads into that viewer.

A view of the same manuscript at the British Library

With the first Jane Austen manuscript example, we get something really significant that we didn’t get with either the JavaScript viewer or the HTML only version of the whole book on one page. We get a single web page with its own URL, easily indexed by search engines as specifically that page of the manuscript, and therefore findable and shareable; the individual view with the work has acquired the status of a proper web resource.

It’s interesting to note that where an institution has adopted a self-contained JavaScript viewer in a newer discovery environment, but still maintains older page-per-view web pages for the same works, it’s those old pages that show up in search engines when your search queries target the content of the object. The newer dynamic views only generate hits for text that matches the catalogue record, or other text on the host page, which might not be any use at all.

If our progressive strategy involves breaking the work up into a web page per view, then the HTML version actually gains something that the JavaScript version doesn’t have, and what it gains may for some be a consideration that outweighs all the others. It’s all about the identity and addressability of individual views within a work, which in some contexts becomes all-important.

The trouble is, we don’t necessarily know what the user’s context is, and how important a page-centric web view might be to them. And if our enhanced version is a JavaScript viewer that stays on the one page, how do we reconcile these two world views? Some users (including robots and crawlers) see a work across multiple web pages. Other users, benefiting from the JavaScript-powered features of the viewer as it loads and traverses the digital object quickly and smoothly, experience it from the point of view of one page only. What happens when these users send links to each other to compare notes?

Page per View vs Page per Work

We started from a purely practical concern for reducing bandwidth consumed, and ended up with a different approach that spreads the “viewer” across many web pages. We now have a different user experience — we’ve added a web-page-centric experience that elevates each view within the work to its own page with an address on the World Wide Web.

I have written elsewhere about the users’s different focus of attention when looking at objects and their views. The problem of focus is that you don’t know whether a web page per view or a web page per work is going to feel more natural and useful to a user, because you don’t know what they are there for. Even the same user may have a different focus at different times. For one user, a single manuscript page of Newton’s Principia may involve months of scholarship. All of the transcriptions and annotations available for that page definitely warrant a distinct web address, it’s a rich island of web content on its own. It should turn up in search results as a resource in its own right, it should have the status on the web as a page, all to itself.

But that same publishing mechanism would result in every page of every digitised printed book getting its own web address too. This might be overwhelming, as search results, or as a navigation experience. For the user riffling through (digitally) or just reading, the web page belongs to the Persuasion manuscript as a whole, the work, and they are inside it, somewhere inside a viewer on the Persuasion manuscript’s web page. Separate web pages, image after image, could seem unnecessarily clunky, especially if there is little or no additional content (transcriptions, annotations). “Why didn’t they use a viewer?”

Is it possible to construct a user experience that allows both kinds of focus at the same time? To have all the benefits of separate web pages and all the benefits of a single page viewer, without the user having to think about that distinction at all, or suffer the drawbacks of each approach?

I think it is.

A partial experiment: one third of the answer

I work for Digirati. In a pilot project we developed for the Royal Society called Science in the Making, we decided that the archival material, and the interactions offered to users, warranted a web page per view rather than a web page per object. The site features archive material connected with published articles in the Philosophical Transactions of the Royal Society, the world’s oldest scientific journal. These items are original manuscripts, drawings, referee reports, photographs and correspondence. Although the pilot was aimed at the general public (leaning towards a viewer), users can transcribe and comment on individual images within a work (leaning towards one web page per image). How much of a viewer-like experience could we deliver for most site users, while keeping each view a distinct web page, with its own URL? We wanted to make sure any transcription text for an image was discoverable by search engines, as part of the HTML content of a specific page. If someone is working on transcribing a page of a manuscript, can it be a real web page? And if they are just browsing, skimming through images in a work, can it feel like a viewer?

We built two distinct views. A web page for the work looks like this:

https://makingscience.royalsociety.org/s/rs/items/MS_626

This page carries catalogue data for the object, some tags from users that belong to the object, and a strip viewer that gives an overview of the work. You can scroll or swipe it.

If you click on an individual view, you go to another web page for that image. It’s a whole new page, but the default behaviour is to scroll that page down slightly, to the “viewer”, and expand this viewer to fill the viewport height:

https://makingscience.royalsociety.org/s/rs/items/MS_626/055d60

The thumbnail strip and the back/forward arrows feel viewer-like, but any navigation within the work is a web page navigation; the browser take you to a new URL. You skip straight past the page furniture on the new page, and the viewer-like parts expand to fill the vertical viewport. This trick depends on the site being fast, of course, otherwise you end up with what feels like a very sluggish viewer as it loads in whole new pages.

If we view the transcription, we’re seeing text that’s basic HTML content of the page. Search engines can find it and index it:

https://makingscience.royalsociety.org/s/rs/items/MS_626/a7f56a

For this pilot, time was limited and the experiment didn’t deliver a full progressive enhancement approach. But it’s mostly there. The basic HTML of the page provides a large JPEG and the thumbnail images; they are all simple HTML links. You can navigate around quite easily in this “viewer” with no JavaScript and fairly light page load times. We would like to do a lot more optimisation for the basic HTML layout, and it still suffers from the problem of scale mentioned above. We’re lucky in that the archival material for the pilot project is seldom bigger than a few dozen pages, the total page weight doesn’t become too big. We never have unmanageable numbers of thumbnails. But the raw HTML for these pages, and the page weight that HTML causes, would be too large if this site were showing 500-page books.

The Exploded Viewer

If we rethink what we are progressively enhancing, we can be cleverer about server side page composition to avoid the page weight problem. If we have control of and can do more work at the server end, we can eliminate some of the design and implementation constraints of a self contained viewer. We can make both ends cooperate. Both server and client are capable of generating specific views of an object with the right amount of contextual information and navigation.

This means a server side viewer, generating web pages per view within a work, with enough navigation to get to any other view (but not always in one step). It doesn’t have to provide the entire work, it doesn’t have to provide every possible thumbnail. Just enough of an HTML window on the work, at that view, with perhaps the thumbs around the view, with perhaps with the start and end thumbs as well, but not necessarily all the thumbs in between. And similarly for a table of contents: a tree opened to the current section, but not opened (or even open-able) to all sections. No matter what page you land on, you can see the page image, other relevant page content, commentary and editorial; you can navigate up, down and around. This HTML experience is just fine — it should be a good one, not an afterthought.

Now the Exploded Viewer steps in, if the browser supports it (almost all will). It loads the source data — probably the very same source data the server used to generate the HTML window on the work at this page — and bootstraps itself as a client-side viewer, open at the current page.

This viewer then takes over the browser’s address bar. Each navigation action that would have taken the HTML-only user to a different web address rewrites the browser’s URL to that same address. This requires a browser that supports the HTML5 History API, which all modern browsers do. To the user it feels like an efficient JavaScript application, because it is.

A similar but incomplete approach to this would be for the server to provide page-addressability by handling any incoming page request within a work with the same page, the page for the work. This single page for the work gives the appearance of page-to-page navigation by modifying the address bar, but those pages don’t really exist on the web as distinct resources that would appear different as HTML. They are just a set of web address paths that the same single work page can handle. The source of the page would be the same each time, with the JavaScript viewer reading the address path and showing the correct corresponding view, then updating the address bar on further internal page navigation.

With the Exploded Viewer, and a server-side rendering of a tailored, contextual view of each page, each address is a real and distinct HTML resource on the web, delivering all the accessibility, addressability and findability benefits mentioned.

This approach combines several techniques introduced above. By no means does it replace self-contained JavaScript viewers — they are essential for portability and embedding. An Exploded Viewer implementation could look exactly the same as a self-contained embedded viewer to users with the right JavaScript capabilities, but it would only be possible to deliver the full capability, from basic HTML through progressive enhancement, on a site that can render individual views server-side, such as a collection’s “home” web application. The server needs the same understanding of the structure of a digital object as the client does, so it can generate the right views for the client to collaborate with.

There is more design flexibility in this approach, too. The viewer is more adaptable to the content, and there’s no need to confine application functionality to a box on the page. It can spread itself out over the web page… exploded on each page, as well as exploded across multiple web pages.

To restate, the principles of the exploded viewer are:

The server can provide one web page per view (e.g., each image of a book page). That is, each view has a distinct URL.
The server’s generated HTML for that view does not have to provide the means of accessing all the other possible views. It could maybe render a window view, a subset of all possible thumbnails. It can render links around the current view, but not necessarily links to all possible other views. Maybe it shows thumbnails like this, with gaps that would need an additional navigation step to land in:
[] [] …. [] [] [] …. [] []
And if structural content such as chapter information is available, the page HTML provides a partially expanded navigation tree, but not the whole tree if it’s going to be too big. A user can navigate upwards and downwards and around, but might need two page navigations to get to every possible other view, via additional aggregate views of the content. The server end of the exploded viewer is capable of rendering these supporting, aggregate views to aid navigation around the work.
There is then a natural upper limit to how much HTML needs to be delivered for a workable, server-side-rendered viewer, no matter how many different views the object has. The HTML for page 800 of War and Peace need not be significantly larger than the HTML for page 7 of The Tiger Who Came To Tea — beyond a certain point it ceases to rise in proportion to the whole object size.
Make this basic HTML version elegant and fast…
…but then, have JavaScript take over and load the data for the object. Let the page JavaScript manage its own resource handling, and be capable of bootstrapping a viewer for the whole object, no matter which view the user starts on. Any view page could be the entry point (e.g., from a bookmark or search result).
This JavaScript takeover should only happen if the browser supports manipulation of the address bar and history via the HTML 5 History API: detection of this feature might be the trigger to support the JavaScript version. Once the JavaScript is running, user action doesn’t cause new web page requests, just data requests. But the effect on the browser’s apparent status (the address bar and history) is the same as if such a request had been made.
What in the basic HTML version would be a new page request becomes a JavaScript-driven request for more content, and an update of the view. The JavaScript modifies the address bar with the URL of whatever view a basic HTML navigation would have led to. The JavaScript is simulating page navigation, but always resulting in a URL that the server would be capable of rendering as a partial window on the work, around a particular page.
The server-side rendered viewer gets progressively enhanced into a client-side viewer that manages its own resource loading, just like a real viewer. The client-side viewer simulates the address-bar changes that would result from equivalent navigation in the server-side rendered viewer.
Actions such as Search Within have different outcomes depending on whether the JavaScript version has bootstrapped itself and taken over. If it has, search results are loaded, displayed and navigated dynamically. If not, it’s like a regular form submission to a page of search results, with links to more pages that can display those results meaningfully.

What’s the start page?

There’s one problem here. What does the home page for the work look like? What is its URL, and does it have a different URL from the web page for the view of the first image (or whatever view in the work is deemed the starting, initialisation view, such as the title page)?

I think this is an implementation decision to be made, rather than a show-stopper. For example, you could decide that while page URLs like /war-and-peace/cover and /war-and-peace/page-1 imply the existence of a homepage for War and Peace at /war-and-peace/, you can decide whether you want to actually provide such a page, distinct from, say, /war-and-peace/cover. You can do one of:

provide /war-and-peace/ as a distinct page, about the work as a whole in some way
make /war-and-peace/ redirect to /war-and-peace/cover as the canonical URL of the work: you have to start somewhere.
make /war-and-peace/cover redirect to /war-and-peace/; the canonical URL for the page chosen to represent the initial view of the object is the object’s URL.

In the Royal Society example, we chose the first option. This works well for multi-image items, because (as seen in the examples above) the view on the work page is a strip of large thumbnails, rather than a full view of the first page. But it isn’t a perfect solution for items that only have one image (such as a photograph as a distinct archival item). In this case, we merged the functionality of the “work” page and the “view” page, and the “view” page never appears. If there is a transcript available, it appears on the same work page as the work metadata.

The Holy Grail

The exploded viewer is not the answer to every use case. It only works when the client and server are in close collaboration. It’s not a portable solution. But that’s the scenario when progressive enhancement, accessibility, addressability and findability are most important — likely in an organisation’s primary discovery environment. Self-contained viewers are a better answer everywhere else. But we don’t have to sacrifice the UI sophistication of those JavaScript applications to get the URL schemes, accessibility and support for every single web user that we must provide in those discovery contexts. This for me is the holy grail; a viewing experience that:

Solves the problem of focus — it can be used for page focus and work focus
Works with no JavaScript, but…
…when it can, it feels like a viewer — really feels like a viewer, for search and other functions too
Is search-engine-friendly, at the page level
Makes sense for single-image items as well as multiple image items
Yields a viewing experience immediately without loading heavy JavaScript applications…
…but does not yield a heavy HTML page with markup for the whole object at once (unless the object is small enough).

The Royal Society example is a prototype that lets us see a path to this model. At the moment, it does neither the partial server side composition (although it does support progressive enhancement) not the JavaScript takeover to bootstrap a multi-page viewer from any entry point. But it’s a start and basis for further experimentation for this approach.