Taking Web Podcasts From Hack to Production

The highs and lows of building a new podcast interface.

Screenshots of the Strange Bird podcast running in the chat-app-like interface of the Guardian Mobile Innovation Lab’s new web player,

A little over a year ago I attended a Hack Day at the Guardian’s offices in London, and spent two days experimenting with some of the new APIs that Chrome provides for audio playback in the browser. Intrigued by what I found, I posted a quick write up once I got back to New York and started thinking about how the lab might expand on the idea. The answer wasn’t immediately obvious, and events like the UK snap election took us down a different path for a while. But we kept coming back to it.

I played around with prototypes using the Presentation API to pair a podcast with videos and images playing on a TV. The underpowered Chromecast hardware meant it wasn’t viable, but there was something compelling about pairing the audio with timed multimedia. So we scaled the experiment back down to phone size, and sketched out what an “augmented” podcast would look like on a small screen.

The result was Strange Bird, an original audio series hosted by the Guardian US’s data editor, Mona Chalabi. The story of its conception and editorial development is told by the lab’s product manager, Sarah Schmalbach, in a separate write up, but I wanted to also talk about how the technical implementation (a project I called “Podmod”) went.

The UI

This was an entirely new component unrelated to the original hack. We struggled to work out how a podcast should look until we listened to the few first takes that Mona and her producer, Josie Holtzman, brought back into the office. The three of us met with the lab’s designer, Dylan Grief, and as we listened we realised how casual and chatty the audio felt, and decided that a chat interface would be a very natural way to present the annotations. It reflected the rhythm and the tone of the audio, and most people know how chats work. Dylan mocked up some UI flows, and I put together a prototype interface based on them. But in the process I discovered that chat interfaces are surprisingly difficult to replicate on the web.

Why? Mainly because of scrolling. Chat logs tend to be giant, almost infinitely scrollable views. For example, if you scroll back in your iMessage history, you’ll notice very quickly that it doesn’t load all your messages. Instead, it batches them, rendering specific chunks as you get near to them. Browsers don’t have any built-in functionality that mirrors that type of batched rendering, so when attempting to display a chat history on a web page, you quickly hit performance issues as you try to render more and more content (see also: liveblogs).

Chat windows also typically slide new messages into the bottom of the window through a graceful animation, which isn’t just cosmetic. The smooth animation helps guide the user’s eye if they were in the middle of reading something else. But browsers don’t give you a lot of control over scrolling. You’re forced to manually set scroll position, causing all kinds of havoc when you try to implement the above-mentioned batching, or if the user tries to scroll in the middle of the animation, or if two new items arrive in quick succession, or, or…

To get around these problems, I created a custom scrolling React component called performance-scroll-view (the code is there for posterity, I need to refactor/document a lot before it’s usable). The general concept is one that we used in our Shifting Lenses experiment: Rather than leave scrolling to the browser, we actually set the position of our elements manually using hardware-accelerated CSS transforms, optimising for performance by only adjusting the elements that are currently visible and leaving the rest off-screen. Like so:

The bonus with taking over rendering in this way is that we don’t have to deal with elements reflowing — e.g. removing items 1 and 2 from the stack doesn’t bump items 3–7 up vertically. This makes inserting and removing batches of elements much easier.

The CSS transform technique is also used by libraries like iScroll, but to me they’ve always had a fatal flaw: They don’t match the physics of the native UI. iOS, in particular, has a very complicated arrangement with its scrollable views — they “rubber band” past their top and bottom, then snap back when the user lifts their finger from the screen. They accelerate and decelerate from a finger flick in a very particular way that no-one seems to be able to replicate entirely.

But since the release of iOS 10, iOS Safari has accurately reported scroll events in the same way Android Chrome does by firing events repeatedly as the element scrolls, providing the current scroll position each time. So rather than try to replicate what the native OS does, I mirrored it by placing a dummy scrollable <div> tag on top of the scroll view and updating the CSS transforms with every scroll event.

On both Android and iOS devices the result was indistinguishable from a native scroll, except that it allowed us to throw a nearly unlimited number of elements onto the screen. Hopefully you didn’t notice a thing.

Other hardware acceleration

The rest of the UI was fairly straightforward, but throughout the project I tried to keep a focus on performance, which meant doing some very unintuitive things. For example, the progress bar has a sliding control you can pick up to set playback position:

My initial implementation just updated the left style attribute of the bar as the user dragged. But when I turned on “paint flashing” (in the Rendering section of Developer Tools) I could see that it was causing a repaint every time it was moved:

This issue didn’t really have an effect on the desktop web experience, but on a lower-end mobile device it was noticeable how slowly the control was reacting to user input. (Tip: keep one of those around for testing. They’re very cheap.) So instead of changing the left attribute, I used a CSS transform, like so: translate3d([value]%,0,0) to move the element around. Ta-da, no repaints:

This same logic applied in a number of different situations: using scaleX() instead of width for the progress bar, translateY() to open the bottom info bar, and so on. It’s not very intuitive, but it does dramatically improve performance on low-end devices.

Offline functionality

As the original hack write-up indicated, there are a lot of possibilities with offline playback. Especially when we’re bundling images along with the audio podcast, it makes sense to try to cache these assets before they’re needed. So how did that process go?

What we tried to do

The concept was relatively straightforward. We included a button on the podcast player page that, when clicked, started downloading all the assets associated with the podcast into a local cache. Because cache operations don’t provide progress events, I created service-worker-download-manager to add them (via Chrome’s ReadableStream API), giving us a progress bar to inform the user when the download is complete. Then, when the page attempts to fetch a podcast resource, the service worker routes that request through caches.match() to use the local version if possible.

What went wrong #1: Caching large files

After the assets had successfully cached, you’d hit the play button on a phone and… nothing would happen. But on desktop it would play. After inspecting the <audio> element, I found that the phone browser would throw an error that the content was not decodable. Weird. This didn’t happen while developing the player, so I wondered what regression bug could have crept in… then I remembered that we were using a shorter, smaller MP3 file in development while the final audio file was being recorded and edited. I switched back to the test file and the problem went away.

My initial thought was storage limits: desktop browsers typically allow more storage than phone browsers. But everything I could find suggested that, even on a phone, Chrome let you use up to 50MB of data. I tried triggering Storage API requests, but the cached files still wouldn’t load — and the cache operations didn’t throw an error, either. Then I discovered something very confusing: caching one 20MB file failed, but caching two 10MB files didn’t. So it was something to do with the size of individual files. In a hurry I threw together a hacky script to split large cache files into 10MB chunks and reassemble them later, making a note to file a Chrome bug once we’d launched. Once that was added to the project, playing a cached MP3 file worked! Which is when I stumbled into…

What went wrong #2: Switching from remote to local assets

Playing the locally cached MP3 file only worked if you hit the download button, waited for it to complete, then pressed play. If you hit play at any point before that, the audio would play fine until it reached the end of the current buffer. At that point it would send a request for the next chunk of the file… and immediately fail. I still haven’t been able to put a test case together for this because we have no manual control over how much the browser buffers and when. But I have a theory. Bear with me.

Chrome tries to optimise the amount of data it downloads when playing audio by using HTTP range requests. Rather than downloading the whole file, it downloads only small chunks on demand. The Cache API’s cache.match() function ignores any Range header when matching, which I assumed was a bug. It turns out I was wrong — the browser is supposed to be able to handle receiving a 200 (full download) response when it expects a 206 (partial download) response. And it does, as we saw when pressing play once the file was fully cached. But I suspect it doesn’t correctly handle receiving 206 responses that suddenly turn into 200 responses part way through playback (when the worker starts returning the cached copy). But I can’t be sure.

More importantly: I discovered all of this about 48 hours before we were due to launch, so our only option was to remove the offline playback functionality. Hopefully the first bug will be fixed soon, and either I can come up with a test case for the second or work around it, possibly through our browser-range-response library.

Next steps

There aren’t likely to be many next steps here in the lab, since our work concludes at the end of March. However, our experimentation has always been intended to be a jumping-off point, or a proving grounds for innovation work that others in the publishing industry are interested in pursuing.

To support those follow-on efforts, the code I’ve written for this experiment is available on GitHub. However, the convoluted nature of this experiment’s development (I wouldn’t be surprised if there’s still some Presentation API code lurking in there) means that the project isn’t organised or refactored for immediate re-use on your site. That said, it would be a great resource to run locally and prototype adding annotations to your own podcast. There is an example script file and MP3 audio in the repo for you to model your own script from.

I hope to continue working on performance-scroll-view and a few of the other mini-libraries I’ve linked to here outside of the lab. Scroll performance in particular is an issue for a lot of mobile sites, and it would be great to come up with an all-purpose solution. To be continued!

The Guardian US Mobile Innovation Lab is a small multidisciplinary team housed within the Guardian’s New York newsroom, set up to explore storytelling and the delivery of news on small screens. It operates with the generous support of the John S and James L Knight Foundation.