Yahoo.com PWA, Part 2: Service Worker
In the previous post, we discussed our client JS enhancements by adding fetch strategies to our client library. In this post, we will discuss our service worker, the backbone of a Progressive Web App (PWA).
Yahoo.com was already using service worker for notifications. But to make PWA possible, we needed to add a lot more capabilities to our existing service worker.
Similar to previous post, this post is broken into 4 sections: What, How, Challenges and Conclusion. We hope you find these posts informative.
What
Early on, we had to make a hard decision: whether to use workbox or to enhance the service worker on our own. Some of our key requirements were:
- More control on what is cached.
- More control on fetch event within service worker for debug-ability.
- Maintain an offline queue to be shared across service worker and our client javascript.
- Ability to feature flag everything (not possible with only HTTP headers).
- Better control and ability on purging service worker cache.
- Ability to patch service worker quickly, as service worker is very notorious with updates.
- A/B testing service worker.
How
After much consideration, we decided to build all the features within the service worker on our own. Here are a few reasons why:
- Workbox had limited support with feature flagging. It uses plugins which only has access to HTTP headers and only supports sync operation.
- Owning the core piece gives us a lot of control with additional responsibilities.
- We were not using app shell model pattern. Without app shell model, it’s very important to not cache logged in experience markup in the service worker cache (service worker cache can be accessed from browser client). Workbox supports hashing, however, we have logout ability across our multiple properties (some of them do not have service workers, are different domain and are managed by different teams). We cannot show the wrong experience if the user logs out and goes offline.
The above considerations might not apply for everyone, but for us, they turned out to be the right fit.
Similar to our client library, we implemented following fetch strategies in the service worker.
Service worker fetch
Network first
As part of the network first strategy, we needed to rethink what we cache. As we did not follow the app shell model, it was important for us to not cache logged in markup as part of service worker cache. As a workaround, we decided to make a second call with stripped out cookies for caching a signed-out experience to ensure we give the user a consumable offline experience without compromising security. We also continually update the cache every time the user accesses our start_url. We do this with a delay to ensure we do not throttle the network.
As part of our network first strategy, we also have retries implemented; we have a strong belief that the user should never see a dinosaur or a 5xx page.
Cache first
Most of our versioned static assets go through a cache first strategy. During our initial implementation, we were asked a lot of questions on why own / change what a browser already does efficiently. It took us some time to figure out the answer;
- Browser caching is not always optimal: https://code.fb.com/web/web-performance-cache-efficiency-exercise/
- Service worker can cache the byte code, reducing runtime parsing, which can be an advantage for big files.
Service worker offline instrumentation
We strongly believe in instrumentation, specifically continuing to improve and iterate on our offline implementation. As part of our current implementation, we maintain an offline queue in IndexedDB for our beacons. We leverage the ‘sync’ event to fire offline beacons when the user is back online. We do this with a retry strategy with a limit to ensure it is temporary.
Pre-caching
We are continuously improving our pre-caching strategies and we pre-cache start_url from our manifest. Pre-caching is part of the service worker ‘activate’ event and post messaging from our client.
Disk space usage
Similar to our client library IndexedDB implementation, we also strive to maintain good disk space availability from the service worker. We use Navigator Storage API to measure the usage, purging the service worker cache accordingly.
Our log shows, there are at-least 10,000 cache cleanup executions in a day.
- On install, we remove all previous cache entries. Our cache is coupled with service worker version number (refer to the code snippet above).
- When we are writing into cache, we look at disk usage and purge non-critical cache entries.
Feature flagging service worker functionalities
We cannot rely on the service worker global state, as browsers control it; sleep and awake state toggle will result in global state being lost.
Our best solution was to use IndexedDB as a persistent storage. As previously mentioned, IndexedDB can be slow and we cannot impact fetch latencies. Checking IndexedDB at the start of fetch can result in variable latency. The easiest workaround was to shift IndexedDB check to a later stage, just before writing into cache.
Writing into cache can be delayed as it does not block request.
We use IndexedDB for enabling / disabling features in the service worker. Every feature has its own implementation on how and when to read from IndexedDB.
Service worker deployment
From our investigation, service worker update takes a very long time. We were expecting service worker to be updated within 24 hours of deployment for all our active users. However, if you notice our graph below, we still have 1.0.3 version of our service worker serving traffic (that is more than two months old!). Typically, it takes at least three days for the service worker to reach a good percentage of our user base.
This can be a problem if you release a faulty service worker. To handle such cases, we use the following strategy:
- We release service worker to a small set of users and let it bake for a few hours, helping s us analyze real time logs for any issues. Once we are confident, we push service worker to production.
- We ensure feature flagging all possible functionalities.
- We do not add cache headers to our service worker.
- We keep our service worker release decoupled from the rest of our stack.
- We can, at any time, rollback our service worker to a previous version.
- If something catastrophic occurs, we have an empty service worker with minimal code for notifications. We use this service worker to rollback all the code and keep our notification subscription working.
Challenges
Service worker boot up challenges
The boot up time for the service worker depends on the device and conditions; it can vary from 50m to 500ms (Google has a great article on this).
This was the very reason, we embraced the Navigation preload experimental feature very early on. Navigation includes your first page request and subsequent page requests, and this only improves navigation preload time.
As an optimization, we install service worker on our AMP powered pages, so that the service worker is installed by the time users visits Yahoo.com.
Conclusion
Service worker is the backbone for Progressive Web App (PWA) giving you far more control and capabilities.
While you can always choose to use workbox, it is important to understand the power of service worker to help debug issues quickly.
If used wrongly, it can cause a lot of damage. Also, plan the fetch strategies well. Only use what you need and remember to have all backups well planned, in case you release a faulty service worker. We have had more than two incidents till date.
In our next post, we will discuss standalone mode and the add to home screen capability.
To try our Yahoo Lite experience, just visit https://www.yahoo.com/ on your android phone and add it to your Home-screen. Based on the chrome version, you would be prompted differently. For Firefox and Samsung browsers, add to home-screen is part of navigation bar.