How Service Workers Saved Our Web App

Evan Shaw
WE BUILD LIGHTSPEED
6 min readMar 25, 2019

All the way back in November 2017, one of my colleagues was working with some others around the office on some work he was doing for our web Sell screen. He noticed something alarming: the updates we were pushing out weren’t…well…updating. The browser wasn’t ever fetching them without a cache clear.

Vend’s Sell Screen

Let’s take a step back for a second. Vend is a cloud-based point of sale system. The Sell screen is the screen that lets our retailers add products to sales, calculate the sale total, and accept payment. It’s arguably the most important screen in Vend. (Full disclosure: I lead the team that works on the Sell screen.)

If the Sell screen stops working, it’s extremely frustrating both for our retailers and their customers. Because the Sell screen is so important to retailers’ businesses, we wanted to make sure that it would continue to work offline, in the event of ISP issues, flakey Wi-Fi, or (dare I say it) a Vend outage.

The Application Cache

At the time we were architecting our current Sell screen, back in 2014¹, there was really only one option if you wanted to work offline: the application cache. It’s since been deprecated, but is still supported by basically every browser.

The application cache lets you make certain resources available offline. Using it is fairly straightforward. First, add a manifest attribute to your html tag:

<html lang="en" manifest="/webregister/app.manifest">

This points to an application cache manifest file. That file contains a list of other files and looks something like this:

CACHE MANIFEST
/webregister/
//vendappcdn.global.ssl.fastly.net/css/app-af5b4cf69d.css
//vendappcdn.global.ssl.fastly.net/js/app-0023757202.js

The first time you visit /webregister/ in your Vend store, these resources are downloaded and cached for offline use, so that you can continue selling offline. Most of our assets are served from our CDN, cache-busted, and are effectively immutable. The index (/webregister/), however, is not. This is important.

The application cache is completely separate from the normal browser cache. So if your resource is served with a Cache-Control header that tells the browser to cache it a certain way, that resource is cached differently and separately from the application cache. This is also important.

¹ In fact, the roots of our Sell screen go further than that, all the way back to 2010. We were using the application cache back then, too, which was pretty cutting-edge stuff. And before that, there was Google Gears!

The Problem

Here’s what the relevant part of our Nginx configuration looked like:

location /webregister/ {
expires max;
try_files $uri /webregister/index.html;
}

This tells Nginx what to serve when you request anything under /webregister/, including that path itself. The expires max might look suspicious to you, and it should. It causes the server to send the header Cache-Control: 315360000, which tells the browser that this resource remains fresh for 10 years. So in addition to being in the application cache, it’s in the browser cache, and the browser will be happy to use its cached version without re-requesting it for an entire decade.

Do you see the problem? When app.manifest updates, the browser re-requests /webregister/. That request is served from the browser cache, causing the old index to still be in the application cache. The browser also fetches all the CDN resources, and these are the correct new ones due to the cache-busting, but it’s all for naught. That’s because the old index still references the old CDN assets. So nothing really gets updated.

There are some details that remain a little unclear even to this day. The expires max line had been there basically since the beginning, but most retailers were “only” a few months behind in their cached resources. Somehow it seemed like they had occasionally been getting updated resources. This also only seemed to affect Chrome, which the vast majority of our retailers use. Perhaps there was a behavior change around the application cache in Chrome at some point?

Anyway, we needed to fix the problem.

The Fix…or Is It?

The fix seems pretty simple. We need to re-configure Nginx so that the Sell screen’s index page gets requested when app.manifest updates.

location /webregister/ {
try_files $uri /webregister/index.html;
# no-cache doesn’t mean "don’t cache",
# it means the browser must revalidate
# with the server before using the cached version.
add_header Cache-Control "no-cache";
}

We tested this out. It worked perfectly and we shipped it.

But there was still a problem: lots of retailers would never see this new Cache-Control header. Because their browsers aren’t even requesting /webregister/ from our servers, remember? They’re storing it for 10 years. This “fix” only works for people who have never visited the Sell screen!

Stuck

We started thinking about how we might fix this, even going so far as to investigate old cross-site scripting vulnerabilities we might be able to take advantage of. Then I came up with a thought experiment: imagine I could run arbitrary JavaScript on every client. Is there any script I could run that would fix the situation? I did some research and concluded that there wasn’t, so I stopped this avenue of investigation.

We floated the problem around to other engineers, but couldn’t come up with a fix within the confines of our current offline architecture. After some time, we devised a solution that seems obvious in hindsight: build a new offline architecture. And luckily, the browser has a newer, better technology for working offline: service workers.

Service Workers

Remember how the MDN page for the application cache had scary deprecation notices all over it? The more modern way to build a web app that works offline is by using the Service Worker API. It allows developers to write JavaScript that acts as a proxy between the browser and the network. HTTP requests get proxied through the service worker and it can do things like serve the request out of a cache or direct the request somewhere else or basically whatever it wants.

The key realization was that service workers take precedence over the application cache! So all we had to do was write a service worker that did our caching and find a way to install it.

Writing the service worker was relatively straight forward. There was one tricky part, though: we couldn’t ship code to install the service worker from the sell screen. Code wasn’t updating, remember? We had to find some other commonly-visited page to deliver the solution.

The sign-in page seemed like a good candidate and had a nice property: we could actually force people to go there by returning a 401 from an API request. And lots of people visit that page all the time anyway, without being forced.

The Rollout

We deployed the service worker, but initially hid it behind a feature flag to be safe. We enabled the feature flag for an affected colleague’s store and had success! After that, we hand-picked a few retailers who we knew were affected so that we could monitor the version they were on, and after a few days, more success! Finally, after sending out some communications to affected retailers, we began a more general rollout.

Here, have a graph! Everyone loves graphs.

After a couple months, the vast majority of our retailers had visited the sign-in page organically, had the service worker installed, and gotten the latest version of the sell screen. (Those regular dips in the graph are weekends, when retailers aren’t as active.)

Not only has our service worker saved us from the application cache trap we fell into, but it has given us greater flexibility around what we cache and when as well as enabled us to pass messages between different instances of our web app.

We’re Hiring!

Vend is growing and that means we’re hiring! Our mission is to support independent retail, which is important for cities and communities. Amazon and big-box retail are great, but shouldn’t be the only option.

We’re expanding existing engineering teams and creating new ones, so we are looking for a variety of intermediate and senior software engineers, as well as leads. We’re currently hiring engineers only in Auckland, New Zealand. Curious about moving? Get in touch! We can probably help.

--

--