Using Service Workers to Improve Audio Asset Load Times

Nicholai Main
Kustomer Engineering
4 min readApr 6, 2022

Service Workers are a very powerful capability in modern web browsers. They can sit between every single network request your application makes and redirect or cache those requests with any logic you can dream of. But how useful is that? Web browsers are already pretty good at caching, right? We experienced a performance issue with file transfers that browsers weren’t handling well, but Service Workers were able to fix.

Our web application can play audio files to notify users of incoming events. Many of our users take advantage of this feature to help them respond quickly to new conversations and messages. Unfortunately, we were seeing issues with the response latency of the audio notifications; in the most extreme cases, audio could take more than ten seconds to start playing!

These notifications were implemented in a simple way: Create an <audio> tag, and then play it. Ignoring some error handling boilerplate, it was simply two lines of code:

What was going wrong here that was taking more than ten seconds, and how could we improve two lines of code? Looking at some client provided network captures, we discovered that the issue was occurring at the network layer. Web browsers were not caching our audio files well, and the audio files were cold loaded much of the time. Additionally, we were loading audio at the worst possible time: users want audio notifications when new messages and new data come in, but that’s also when we have to make many other network requests of our own to resolve the incoming data and make sure our users have everything they need.

​We use a Service Worker to manage client-side caching of resources for our web application. Initially, we used this worker only for Javascript and CSS files. But upon seeing this issue, we wondered if our service worker could assist us with audio as well.

Caching isn’t a silver bullet to solve all problems. Media files can be very large, and browsers can start warning and evicting data at 50MiB cached or less. But for our use case we saw the potential for real wins here. Our audio files are very short and small, and any one individual user won’t use more than a couple audio files for the few notifications they are interested in.

Our first step was to make sure the audio files were as small as possible. Our service worker cache was packed full of Javascript and CSS already, so the more bytes we could save, the better. Opus is a modern codec for audio storage that saves space over older codecs, while providing the same or better audio quality. While it isn’t exactly recent news, having been first released in 2012, our audio files were still encoded as MP3 which dates back to the prehistoric era (1993).

Browser support for Opus is good, with Safari the only exception. By using fallback sources for <audio> we can serve Opus to browsers that support it, and MP3 for everyone else:

We saw a 60–70% reduction in file sizes for our audio by switching to Opus.

Next, we added the audio to our service worker’s caching list. We have a small service worker we built in-house that supports caching by a hash naming convention, and allows different users of our app to be on different versions of the app at the same time, while supporting caching unchanged assets across versions. We simply had to add these assets to our Webpack configuration, with the appropriate naming scheme, and then add the file extensions to a whitelist in our service worker.

There was one wrinkle: browsers typically supply a Range header when requesting audio data, expecting to get back only the first part of the file with a 206 response, and then requesting more as needed later.

Service workers can’t cache partial responses and any attempt to do so will throw an error. As we had reduced our file sizes down to 10–15kiB, we were confident in serving up the entire file and caching the whole thing. So we removed the Range header in the service worker:

This caused our CDN to see a request without a Range header, and to return a 200 response. That was then fully cacheable, and browsers accepted the data as 206 responses are optional.

This little project worked out very well in the end. Average load times for audio files before this update were about 500–700ms, with occasional outliers taking more than 30s. Now, cold loads take about 300–400ms, and cached loads (which almost all loads are) take 10ms or less.

Median duration of all media asset loads, both before and after our changes.

We don’t have great information about the outliers, as users with network problems loading files also have network problems reporting telemetry, but no news is good news. Reports of missing audio notifications have dropped to zero since we rolled out this change.

To conclude, when looking at service workers, always consider the numbers behind your data. How big are the files? How often are they requested? How often do they change? If those numbers work out, then service workers will let you cache and reap benefits for any kind of request.

--

--