A primer on web and service workers

If you’ve been following the evolution of web development in the last ten years, especially client-side applications, you’ve probably heard more and more about the concept of workers in articles and conversation — even more so in the last couple of years since Google (and Jake Archibald, with vigor!) started pushing for the introduction of Service Workers to web standards.

I personally became genuinely interested in workers after I attended Andrew Dunkman’s talk on the subject at the Abstraction conference in Pittsburgh this past August (you can watch his entire talk from another conference here).

The problem is that if you don’t have a specific background in technologies such as server-side languages or mobile native frameworks, there’s a good chance that a worker means nothing to you. So what are workers exactly, why do they matter and how do you make them… work? Let’s try and find out!


What is a worker?

(Disclaimer: I’m not a pro on the following subject, so take this as a general and summarized overview.)

Defining a worker requires a small detour to examine the concept of multi and single-threading. In programming, a thread is a process (in other words, a task). That thread contains all the context it needs to be run independently from other threads, which makes it easy to run in parallel and/or concurrently.

Since most CPUs now have multiple cores, a multi-threaded application can take advantage of the free available power at its disposal fairly easily by dividing the computation work between multiple threads. Most of the popular languages and frameworks are compatible and support the concept of multiple threads, but may not implement it out of the box. Some languages, like Clojure, also make it easier to use multi-threading than others.

Basically, the lowdown is that a worker is roughly equivalent to a thread. During its lifecycle, the main application (the main thread) has the ability to spawn workers punctually when it needs to run some extensive or expensive tasks, out of the way of its main processes. In addition, workers are also widely used to run tasks in the background and/or at a later time.

One obvious practical use of workers is to delay the resizing of an uploaded image to another time and/or to another thread. In the real world, this means that a user creating a blog post won’t have to wait for all of these long operations to finish before saving the post and jumping to do something else on the platform. In the Rails universe, for example, to achieve this feature, one would be taking advantage of the active jobs framework in conjunction with something like paperclip to resize an image and a gem like delayed_paperclip as the glue to bind the two together. This implementation would effectively allow a user to upload an image to the server, which would divert the image manipulation process to background jobs (or workers). These workers would then be queued and would apply their magic on their own thread, when processing power is available to them or at a given time of the day, for example.


Web apps are single-threaded

Typically, both server and client-side Javascript applications are single-threaded.

Indeed, while Node supports, integrates and internally uses multi-threading, it “operates on a single thread, using non-blocking I/O calls, allowing it to support tens of thousands of concurrent connections. [This] is intended for building highly concurrent applications, where any function performing I/O must use a callback. In order to accommodate the single-threaded event loop, Node.js utilizes the libuv library that in turn uses a fixed-sized threadpool that is responsible for some of the non-blocking asynchronous I/O operations” [Wikipedia].

In other words, Node is multi-threaded internally but it will only expose a single-thread to your code via the event loop. This means that the technique is effectively abstracted away from you, unless you explicitly want to use it. You can also spawn new processes using modules like clusters and pm2… but that’s an entirely different subject!

On the flip side, web applications have historically been single-threaded, but this has evolved since the first introduction of web workers in the early 2010s. The following sections in this post will go into detail on what these workers are, what the differences between them are, and how they work.


All workers are not created equal

There are mainly two kinds of workers in the client-side world: web workers and service workers. They may share the same label, but they are quite different in terms of their ultimate goal. I personally often got confused when I started reading about them, so let’s try to qualify them before we go deeper.

Web worker: the muscle

The web worker is in many ways what you would expect from such a tool: “[It is a standard] that allows web application authors to spawn background workers running scripts in parallel to their main page. This allows for thread-like operation with message-passing as the coordination mechanism”.

Since “as you may know, at 60 fps, you have around ~16ms (1000ms/60) per frame to do what you have to do […]. If you exceed that budget, you will alter the frame rate and potentially make your content feel sluggish or worse, unresponsive” [Typedarray].

In other words, web workers are tools that you can summon to crunch data or to process any other expensive task without impacting the main UI thread, where not only your main application lives, but also shares processing power with the browser’s rendering of the DOM, animations, etc.

Conversely, it also means that a long-running script, one that needs a computation power to complete, can benefit from running “in the background independently of any user interface scripts”. This allows calculations “that are not interrupted by scripts that respond to clicks or other user interactions, and allows long tasks to be executed without yielding to keep the page responsive” [Whatwg].

The standard is really well supported with ~90% of relevant compatible browsers. You can see some examples of it in action here. I also invite you to check out the library called hamster.js for a useful implementation example.

(Note that you may encounter the sub-feature called Shared workers — a web worker accessible and shared between many scripts of the same origin, but I advise against using it, as its support is plummeting and it will likely be dropped by all the browsers at some point in the future.)

Service worker: the proxy

This flavour of worker is completely different from the more traditional worker in many ways. Yes it does reside on its own thread, in parallel from the application that spawned it in the first place. After all, at its core, it is a type of web worker, right? But the service worker’s mission is completely different. A service worker is “a method that enables applications to take advantage of persistent background processing, including hooks to enable bootstrapping of web applications while offline” [W3C].

In the wise-words of Jake Archibald, a “service worker is like a shared worker, but whereas pages control a shared worker, a service worker controls pages. Being able to run JavaScript before a page exists opens up many possibilities, and the first feature we’re adding is interception and modification of navigation and resource requests. This lets you tweak the serving of content, all the way up to treating network-connectivity as an enhancement” [Jake Archibald].

So this little guy is acting like kind of proxy in between its parent application and the network requests made by it. It will also have access to various APIs such as cache, push notifications, and background sync, with the main objective of helping developers make better offline-first native-like apps. It is effectively the evolution of the now defunct AppCache, but packaged in a much more coherent, flexible and dev-friendly fashion.

So, as you can see, we’re not talking about saving memory here, or about making sure that a page runs smoothly while our app is steaming. This kind of worker is another beast altogether, albeit as useful. Fortunately, it’s also a fairly well supported for such a new spec, with around ~60% of browsers supporting most of its current features. You can see examples of the service worker in use here and here. Some libraries already exist to help you streamline the use of the standard, like sw-precache and UpUp.

(For a info on the above workers, please read this very good article by Aaron T. Grogg and this post from the team behind the service worker spec for more info on the differences between the two.)


How do they work

Given the thready and parallel nature of workers, it seems quite normal that their creation and management is different from the more traditional part of the application. Let’s go over the specificities common to both web and service workers, as well as what makes each of them unique.

Features shared by both workers

External file

By definition, a worker has to live independently from the rest of the application to be able to be run in parallel, to be spawned on and off easily, etc. This means that the worker will reside in its own file, which will get imported by the main application to kick off the lifecycle of the worker.

// 
// Web
//
if (window.Worker) {
const myWebWorker = new Worker("web-worker.js")
}
// 
// Service
//
if ('serviceWorker' in navigator) {
navigator
.serviceWorker
.register('/service-worker.js')
.then(...)
}

In a similar fashion, your worker’s scope will need to import possible external dependencies at runtime. For example, let’s say you intend to use an Ajax library to make your API calls, you would need to import it in the worker’s thread, even if it was also part of your main application.

To do so, it can use the importScripts method and the worker will get that script for you.

// 
// Inside the worker
// NOTE: `self` is not necessary
//
importScripts('foo.js')
self.importScripts('foo.js', 'bar.js')

No access to the DOM, global objects and some storage features

Since workers have no browsing context, they live in their own WorkerGlobalScope. They indeed only have access to that specific scope (by using the keyword self) and to a limited set of APIs that the browser is exposing to them, a lot of which are read-only in nature.

You can see a more detailed list of the various attributes, events and methods it can plug into globally here. I also invite you to browse through this extensive list of what is supported and what isn’t, depending on the browser.

As such, what we are learning is that workers don’t have any access to the DOM of your application, and partial-to-no access to the most common global objects like window, document, console, etc. This is a bit unsettling because these are vehicles that we use (and abuse) throughout our conventional applications to interact with the browser and its UI. Rest assured though: there is another way!

Events and messages

If they can’t interact with the scope of your application directly, workers have a neat way to communicate back and forth with the main thread through events and messages.

The main thread

After instantiating the worker, the main application will have access to the worker’s postMessage method, which it can use to send an data to the worker.

const myWebWorker = new Worker("web-worker.js")
const sendToWorker = function () {
myWebWorker.postMessage({ foo: 123, bar: 456 })
}

The data transferred to the worker can be of any type that is compatible with the structured clone algorithm, which encompasses most of the formats that you would expect to use, like arrays, objects, sets, etc. But it is deemed good practice to stringify objects before sending them back and forth between main and worker threads. The main application will also need to register event listeners on the worker, so that the latter can communicate messages back to it.

myWebWorker.addEventListener('message', function (event) {
console.log("Data back from the worker", event.data)
})

The data attached to that message event will subscribe to the same requirements as the data passed to the worker in the first place.

The worker

When instantiated, the spawned worker starts listening to the message event automatically. By defining a callback to that event, we can easily grab the data coming from the main application and process it.

self.addEventListener('message', processData)

Notice that we’re using the self global object to reference the worker itself and not the main application. In a similar fashion, we’re also able to send messages back by using the postMessage method.

const processData = function ({ data }) {
let newData = {
foo: (event.data.foo * 2),
bar: (event.data.bar / 2)
}
  self.postMessage(newData)
}

As you can see, communication between your application and its spawned workers is pretty simple to understand and integrate. It’s also very modular, which makes total sense and makes for a loosely coupled threading.

One caveat is that it’s much more common to use this postMessage vehicle with a web worker than a service worker, because of the nature of their objectives, as we saw earlier.

Not promised-based

Even if service workers implements the Promise natively for some of its API interactions (see below), cross-thread communication is handled using events and callbacks. We all know how annoying and messy these can become when compared to promises, so it’s a bit of a bummer that workers don’t integrate this asynchronous paradigm out of the box. Of course, there are ways to polyfill this pattern quite easily. I encourage you to check out this one by Nolan Lawson.

Features particular to web workers

Heavy to create and keep around

In a gist, web workers are not cheap and you should be careful when spawning them. They are indeed “relatively heavy-weight, and are not intended to be used in large numbers. Generally, workers are expected to be long-lived, have a high start-up performance cost, and a high per-instance memory cost” [Whatwg].

Can spawn sub-workers

On the topic of multiple workers, the web kind have the ability to spawn sub-workers themselves, using the same flow as described above.

Needs to be terminated

While a worker process is destroyed when your main application stops (i.e. closing the tab), it needs to be manually terminated when it is no longer useful to your running application. To do so from the main application, you can use the terminate method on the worker object, which will instantly kill it, even if it’s still processing. A worker can also kill itself by using the close method available on its scope.

Features particular to service workers

Needs to be registered

Contrary to the web worker, this one needs not only to be instantiated but also registered by the main application.

//
// Main app
//
navigator
.serviceWorker
.register('service-worker.js')
.then(function(registration) {
if(registration.installing) { ... }
else if(registration.waiting) { ... }
else if(registration.active) { ... } })
.catch(...)

Note that the register method returns a promise. Because it’s 2016.

Once the registration process has started in the main thread, the worker has the opportunity to listen to a couple of lifecycle events: install and activate. You can hook to the install event to “prepare your service worker for usage when this fires, for example by creating a cache using the built in storage API, and placing assets inside it that you’ll want for running your app offline” [Mozilla].

//
// Worker
//
self.addEventListener('install', function(event) {
event.waitUntil(...)
})

Once the activate event is fired, you worker is in control of your application (or its scope) and can work its magic.

Does not require a page at all

As I mention above, one of the most obvious uses of the service worker is proxy server requests, which involves inserting itself between your application and the browser. As such, it only makes sense that the worker, once registered, can live completely independently of any page or application.

In Chrome, for instance, you can view the list of registered service workers by using the web inspector (Application > Service Workers) or by going to chrome://serviceworker-internals.

Has no state and is event-driven

Once registered and active, the worker won’t be doing much on its own. It will remain in an idle state and wait for specific events to wake it up. The most common of these events would be fetch, push or notification (and eventually sync).

Since the worker lives in a sort of dormant state until something happens, we cannot and should not rely on its state for our application’s needs: the service worker is stateless. It will be turned on and off by the browser and thus can’t be relied on to persist any kind of information. If you really had to do so, you could use indexDb locally or store the information on a distant server (which would defeat the purpose of an offline-first application).

Can be updated

Note that a worker must be re-downloaded at least every 24 hours, mainly to prevent a broken or ineffective script from impacting the browser for too long.

“Whenever you navigate to a page within scope of your service worker, the browser checks for updates in the background. If the script is byte-different, it’s considered to be a new version, and installed” [Github].

This size comparison does not include the imported scripts but solely the script of the worker itself and even “if there is even a byte’s difference in the service worker file compared to what it currently has, it considers it new [Google]”.

If the browser considers the worker to be a new file, that new version will be installed, but the old one will still control your pages, while the most recent one will fall into a waiting state. Only when all of the tabs controlled by the worker are closed will the new worker version replace the previous one, emitting its activate event at the same time. If you want to trigger an update programmatically, you can do so using the update() method within the worker’s registration flow. Finally, you can unregister a worker using unregister() in the same fashion.

Requires HTTPS

For obvious reasons, service workers will only work when served through secure HTTPS protocols. You can read more about the rationale behind this choice here. Be aware however, that localhost is considered as a safe origin for development ease.

Conclusion

As you can see, without revolutionizing web applications, workers definitely give us Javascript developers another string in our bow. Throughout this article, I’ve really only brushed the surface of what’s possible with these new tools. Efficient caching techniques for offline-first applications, data crunching for online games, push notifications to the browser are only the tip of the iceberg here. I can’t wait to see what the community will create with the web and service workers of tomorrow!

So there you go — I hope this article helped ease you into the world of workers. If you have any questions or comments, please don’t hesitate to drop me a line in the section below!

Peace