Will an “Open Web” Liberate Reading Data?

Navigating books

At last week’s BEA, we received the news that the IDPF, the standards body for ebooks and the organisation responsible for the current EPUB specification, is considering merging with the W3C, the standards body for the web at large. This would mean that instead of having its own standards body for digital books, the publishing industry would be a smaller part of the much wider online publishing world, including magazines, newspapers, blogs, websites and more.

Some, like Peter Brantley, organizer of the annual “Books in Browser” conference, greeted the news with enthusiasm (see here), while others, such as ebook wizard Baldur Bjarnason, were more sceptical (see here). Hachette’s Dave Cramer’s response on Medium is also worth reading here.

One of the reasons for the merger, according to Sir Tim Berners-Lee, father of the web, was the prospect of making ebooks trackable.

Along with interlinking, content should be trackable,” Berners-Lee said. “Publishers must have the ability to understand how books are being read and shared. We should live in a world of linked data”.

It is this issue that I would like to address in this post.

First off, what will change as a result of the merger, and will we suddenly have treasure troves of data at our fingertips? Well, probably very little will change in the near future. The merger has not yet even been agreed to, and any new framework emerging may take years to be proposed, drafted, discussed and refined.

In this context, it is worth noting that EPUB 3 was introduced more than five years ago and is still not used by all publishers. It is not even used by leading reading applications yet, such as Nook by Barnes & Noble, Tolino, the leading ebook app in Germany, Aldiko (owned by Feedbooks) and Bluefire Reader, though the latter two companies have announced plans to support EPUB 3 by the end of this year.

In other words, technological change in the book publishing industry does not happen fast. This will not come as surprise to any insider. Even outsiders know that book publishing is a rather conservative industry.

But let us fast forward several years and imagine that books might now live in the “open web.” Does that mean they will be much easier to track, because instead of reading books inside Kindle, Nook and Kobo, we will be reading them in “normal” web browsers, such as Chrome (Google), Firefox (Mozilla), Safari (Apple) or Internet Explorer (Microsoft). You already notice that we are exchanging one set of technological giants (Amazon and Apple) with another (Google and Microsoft).

Also, maybe books will not be read on the “open” web, but in the closed web that is Facebook. If that were to occur, Facebook would be the new gatekeeper. Today Google is already unable to track and analyse what happens inside Facebook, so as brave as the proposal by Berners-Lee is, it may not make books really more trackable if they merely moved from one walled garden (Amazon) to another (Facebook).

But let us assume they really live on the genuinely open web of browsers like Chrome, Firefox, Safari and Internet Explorer and not inside the walled garden of Facebook. Now, you could surely use Google Analytics to understand the reading behaviour of users? Well, not so fast.

Other forms of content, like audio (Soundcloud) and video (Youtube), live on the open web, and Google Analytics gives you only limited insights. The download statistics are still owned by those who host the content, i.e. Soundcloud and YouTube. Books are no different. Most readers get their books from booksellers like Amazon, B&N, Kobo and others, not publishers (there are notable exceptions like tor.com, harlequin.com and lostmyname.com).

Tracking technologies such as candy.js by Jellybooks would be much easier to deploy in such an environment, but you will still need this sort of customized tracking technology to measure how media is consumed unless it’s small snippets of webpages. Long-from content of more than 10,000 words (and that means books) will still require dedicated analytics tools.

It’s also worth pointing out that Google Analytics primarily deals with how users navigate from one webpage to another (see for example this lovely cartoon from xkcd). Google Analytics doesn’t deal with things like reading engagement, whether you scroll and flip pages, where and when you pause, if you finish the particular book and what the audience for that book looks like. Book reading is still very, very different from browsing the web. Google Analytics was developed for a totally different form of user engagement, which is searching for snippets of information, shopping for goods, navigating from one webpage to another, and will not help an author or publisher understand reader engagement. That’s why specialized tools are used in addition to Google Analytics by many who specialize in this area, and book publishing will be no different.

However, let me investigate this from another angle: what if the APIs and interfaces of the open web were available inside today’s EPUB standard? Many of these are crippled or unsupported in today’s EPUB 3.0.1 standard. This is something we could fix without a merger.

First, EPUB 3.0.1 removed the support for POST and now only allows GET. For Jellybooks users, this creates an inferior user experience. There are many cases in which a better user experience can be created for readers when one can extract and transmit data in browser using the POST command rather than the GET command. This is a feature of the open web that we should bring back to ebooks.

Two other features we are sorely missing to make books more trackable would be support for these two APIs:



find out where a piece of the webpage (ebook chapters are webpages) is in relation to the viewport

Note on why this is useful: You can establish what the reader (rather than the machine) is actually looking at, because you know what is actively displayed on the smartphone, tablet or e-reader screen.

‘visibilitychange’ event and ‘document.visibilityState’ property


find out if a chapter/HTML file is visible to the reader.

Note on why this is useful: ebook reading apps like iBooks aggressively pre-load chapters, so that companies like Jellybooks have to deploy all sort of “clean-up” algorithms to determine if a reader has actually starting opened and started reading a chapter or not

These would allow us to track how readers page through a book rather than just analyze when they open, pause or close a chapter. The latter already tells us a lot about readers, but we could gain even more granular insights if we could reach down to the level of the individual page. These are APIs that are part of the open web but not supported in today’s ePUB standard. We could improve the trackability of books immensely with such small tweaks.

Furthermore, one of the big problems with all this is that so many people in the publishing industry have limited bandwidth and resources (travel cost) and too little time to engage with standards bodies. A merger will not improve this situation. Thus, we in the publishing industry suffer from an underdeveloped ecosystem that we have abdicated to others and that is, as a result, now controlled by platforms like Amazon, Apple and others who don’t have that much economic or philosophical interest in books.

It’s a shame really, but let us make the best of what we have today!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.