Upgrading Ele.me to Progressive Web App
Since the very first experiments that @Vue.js tweeted, we at Ele.me (the biggest food ordering and delivering company in China) have been working on upgrading our mobile website to a Progressive Web App. We’re proud to ship the world-first PWA exclusively for the Chinese market, but even prouder to collaborate with Google, UC and Tencent to push the boundary of web experience and browser supports in China.
Multi-page, Vue, PWA?
There is a prevailing opinion that only structuring a web app as a Single Page App can we build PWAs that deliver app-like user experience. Popular reference examples including Twitter Lite, Flipkart Lite, Housing Go and Polymer Shop are all using the SPA model.
However at Ele.me, we’ve come to appreciate many advantages of a Multi-Page App model, and decided to refactor the mobile site from an Angular 1 SPA to a Multi-Paged app more than a year ago. The most important advantage we see is the isolation and decoupling between pages, which allows us to built different parts of the mobile site as “micro-services”. These services can then be independently iterated, embedded into 3rd-party apps, and even maintained by different teams.
Meanwhile, we also leverage Vue.js to boost our productivity. You may have heard of Vue as a rival of React or Angular, but Vue’s lightweight and performance make it also a perfect replacement of traditional “jQuery/Zepto + template engine” stack when engineering a Multi-page app. We built every component as Single File Components so they can be easily shareable between pages. The declarative-ness plus reactivity Vue offered help us manage both code and data flow. Oh, did I mention that Vue is progressive? So things like Vuex or Vue-Router can be incrementally adopted if our site’s complexity scales up, like…migrating to SPA again? (Who knows…)
In 2017, PWA seems to be all the rage, so we embark on exploring how far can our Vue-based Multi-page PWAs actually go.
Implementing “PRPL” with MPA
I love PRPL pattern because it gives you a high-level abstraction of how to structure and design your own PWA systems. Since we are not rebuild everything from scratch, we decided taking implementing PRPL as our migration goal:
1. PUSH/PRELOAD critical resources for initial route.
The key of pushing/preloading is to prioritize resources hidden in deep dependency graph and make browser’s network stack busy ASAP. Let’s say you have a SPA with code splitting by route, you can push/preload chunks for the current route before the “entry chunks” (e.g. webpack manifest, router) finish downloading and evaluating. So when the actual fetches happen, they might already be in caches.
Routes in MPAs naturally fetch code for that route only, and tend to have a flattening dependency graph. Most scripts depended by Ele.me are just
<script> elements, so they can be found and fetched by good old browser preloader in early parsing phase without explicit
To take benefits from HTTP2 Multiplexing, we currently serve all critical resources under a single domain (no more domain sharding), and we are also experimenting on Server Push.
2. RENDER initial route & get it interactive ASAP
This one is essentially free in MPA since there’s only one route at one time.
A straightforward rendering is critical for metrics such as First-Meaningful-Paint and Time-To-Interactive. MPAs gain it for free due to the simplicity of traditional HTML navigation they used.
3. PRE-CACHE remaining routes using Service Worker
This’s the part Service Worker come to join the show. Service Worker is known as a client-side proxy enabling developers to intercept requests and serve responses from cache, but it can also perform initiative fetch to prefetch then precache future resources.
We already used Webpack in the build process to do
.vue compilation and asset versioning, so we create a webpack plugin to help us collecting dependencies into a "precache manifest" and generating a new Service Worker file after each build. This is pretty much like how SW-Precache works.
In fact, we only collect dependencies of routes we flagged as “Critical Route”. You can think of them as “App Shell” or the “Installation Package” of our app. Once they are cached/installed successfully, our web app can boot up directly from cache and available offline. Routes that “not critical” would be incrementally cached at runtime during the first visit. Thanks to the LRU cache policies and TTL invalidation mechanisms provided by SW-Toolbox, we have no worries of hitting the quota in a long run.
4. LAZY-load & instantiate remaining routes on demand
Lazy-loading and lazily instantiating remaining parts of the app is relatively challenging for SPA to achieve. It requires both code splitting and async importing. Fortunately, this is also a built-in feature of MPA model, in which routes are naturally separated.
Noticed that the lazy-loading can be done instantly if the requested route is already pre-cached in Service Worker cache, no matter whether SPA or MPA is used. #ServiceWorkerAwesomeness
Surprisingly, we found Multi-page PWA is kinda naturally “PRPL”! MPA has already provided built-in support for “PRL”, and the second “P” involving Service Worker can be easily fulfilled in any PWA.
So what about the end result?
In Lighthouse benchmarking, we made Time-to-Interactive around 2 seconds, and this was benchmarked on our HTTP1 server.
The first visit is fast. The repeat visit with Service Worker is even faster. You can check out this video to see the huge difference between with or without Service Worker:
Did you see that? No, I mean the annoying blank screen. Even in the Service Worker one, the blank screen is still conspicuous during navigating. How can that be?
Multi-page Pitfall: Redo Everything!
So here is the profile (2x slower CPU simulated) of our entry page (most heavy one). Even we can make Time-To-Interactive around 1s in repeat visit, our users can still feel too slow for just “switching a tab”.
Could Browser Caches Help?
Yes and no.
Another browser cache you might hear of is “Back-Forward Cache”, or bfcache. The name varies, like Opera’s “Fast History Navigation” or WebKit’s “Page Cache”. The idea is that browsers can keep the previous page live in memory, i.e. DOM/JS states, instead of destroying everything. In fact, this idea works very well for MPA. You can try every traditional Multi-page websites in iOS Safari and observe an instantaneously loading when back/forward. (With browser UI/Gesture or with hyperlink can have a slight difference though.)
Unfortunately, Chrome has no this kind of in-memory bfcache currently concerning to memory consumption and its multi-process architecture. It just leverages HTTP disk cache to simplify the loading pipeline, almost everything still needs to be redone. More details and discussions can be seen here.
Striving for Perceived Performance
Although the reality is dark, we don’t want to give up so easily. One optimization we try to do is to render DOM nodes/create Virtual DOM nodes as less as possible to improve the Time-To-Interactive. While another opportunity we see is to play tricks on perceived performance.
Owen Campbell-Moore have written a great post “Reactive Web Design: The secret to building web apps that feel amazing” covering both “Instant loads with skeleton screens” and “Stable loads via predefined sizes on elements” to improve perceived performance and user experience. Yes, we actually used both.
What about we showing the end result after these optimizations first before entering technical nitty gritty? There you go!
So fast that you can not see the pulsing Skeleton Screen clearly? Here is a version showing how it looks like under 10 times slower CPU.
This is a much better UX, right? Even we have slow navigation in slow devices, at least the UI is stable, consistent and always responding. So how we get there?
Pre-rendering Skeleton Screen with Vue at Build-Time
As you might have guessed, the Skeleton Screen that consists of markups, styles, and images is inlined into
We don’t want to manually craft each Skeleton Screen for each route. It’s a tedious job and we have to manually sync every change between Skeleton Screens and the actual UI components (Yes we treat every route as just a Vue component). But think about it, Skeleton Screen is just a blank version of a page into which information is gradually loaded. What if we bake the Skeleton Screen into the actual UI component as just a loading state so we can render Skeleton Screen out directly from it without the issue of syncing?
Thanks to the versatility of Vue, we can actually realize it with Vue.js Server-Side Rendering. Instead of using it on a real server, we use it at build time to pre-render Vue components to strings and injected them into HTML templates. You should write code that is “universal” to make Vue components can be executed in Node. But for routes that depend heavily on some DOM/BOM-specific 3rd-party modules, we have to make a separated
*.shell.vue to temporarily work around it.
Fast Skeleton Painting…
Having markups in
*.html doesn't mean that they will be painted fast, you have to make sure the Critical Rendering Path is optimized for that. Many developers believed that putting script tags in the end of the body is sufficient for getting content painted before executing scripts. This might be true for browsers supporting rendering an incomplete DOM tree (e.g. streaming render), But browsers might not do that in mobile concerning slower hardwares, battery, and heats. Although we are told that script tags with
defer is not parser-blocking, it doesn't mean we can get content painted before executing scripts in reality.
First I want to clarify it a little bit. According to the Scripting section of HTML (WHATWG living standard, the W3C’s same here),
async scripts would be evaluated as soon as it is available thus could potentially blocking parsing. Only
defer (and not inlined) is specified to be never block parsing. That’s why Steve Souders ever posted "Prefer DEFER Over ASYNC". (
defer has its own issue and we will cover it later.)
More importantly, a script not blocking parser could still block painting nonetheless. So here is a reduced test I wrote named “Minimal Multi-page PWA”, or MMPWA, which basically render 1000 list items within an
async (and truly not parser-blocking) script to see if we can get Skeleton Screen painted before scripts get executed. The profile below (over USB debugging on my real Nexus 5) shows my ignorance:
Yes, keep your mouth open. The first paint is blocked. I was also surprised here. The reason I guess is that if we touch DOM so quickly that the browser has still NOT finished previous painting job, our dear browser has to abort every pixel it has drawn, and has to wait until current DOM manipulation task finished and redo the rendering pipeline again. And this more often happenes with a mobile device with a slower CPU/GPU.
Fast Skeleton Painting with setTimeout Hack
We indeed encountered this problem when testing our new beautiful Skeleton Screen. Perhaps Vue finishes its job and start to mount nodes too fast ;). But anyway we have to make it slower, or rather lazier. So we try to put DOM manipulation things inside
setTimeout(callback, 0), and it works like a charm!
I think you may curious about how this change performs in the wild, so I have refined MMPWA by rendering 5000 list items rather 1000 to make the differences more obvious, and by designing it in an A/B testing manner. The code is on Github and the demo is live on huangxuan.me/mmpwa/. Here is also a video for loungers.
setTimeout hack (a.k.a. Zero Delays) looks quite magic, but it is science™. If you are familiar with event loop, it just prevents these code from executing in the current loop by putting everything to the task queues with the Timer Callback, so the browser could breath (update the rendering) in the main thread.
So we applied what we learned from MMPWA by putting
new Vue() inside
setTimeout and BOOM! We have Skeleton Screen painted consistently after every navigating! Here is the profile after all these optimizations.
Huge improvements right? This time we hit First Paint (Skeleton Screen Paint) at 400ms and TTI at 600ms. You should really go back to have a before-after comparison in details.
One more thing that I deferred
But wait, why is there still a bunch of guiltily parser-blocking scripts? Are they all
async? OK, ok. For historical reasons, we do keep some parser-blocking scripts, like lib-flexible, we couldn’t get rid of it without a huge refactoring. But most of these blocking scripts are in fact
defer. We expected that they can be executed after parsing and in order, however the profile kinda slap on my face. :(
Remember I said I would talk about one issue of
defer previously? Yes, that's it. I have had a conversation with Jake Archibald and it turns out it might be a bug of Chrome when the deferred scripts are fully cached. Vote it at crbug!
Similar improvements can be seen from Lighthouse (Under same network environment but HTTP2 server). A Pro Tip is you should always use lighthouse in a variable controlling approach.
Performance In the Real World
Alex Russell has given a very insightful talk on mobile web performance at Chrome Dev Summit 2016, talking about how hard can we build performant web applications on mobile devices. Highly recommended.
Chinese users tend to have a pretty powerful phone. MI4 is shipped with snapdragon 801 (slightly out-performs Nexus 5) but only costs 100$. It’s affordable by at least 80% of our users so we take it as a baseline.
Here is a video screen-recorded on my Nexus 5 showing switching between 4 tabs. The performance varies between tabs due to their variant scale. The heaviest one, entry page, take around 1s to hit real Time-To-Interactive on my Nexus 5.
FYI. This is surprisingly comparable to what I get from Chrome Simulation with 2x CPU throttling. With 5x throttling, this can spend 2–3s to get TTI, horribly. (To be honest, I found even under same throttling, the results can vary drastically depended on my Macbook’s “mood”.)
This article is much longer than I expected. I really appreciate it if you have gotten here. So what can we learn from it?
MPA still has some way to go
Jake Archibald ever said that “PWA !== SPA” at Chrome Dev Summit 2016. But the sad truth is that even we have taken advantages of bleeding edge technologies such as “PRPL” pattern, Service Worker, App Shell, Skeleton Screen, there is still a distance between us and many Single Page PWA just because we are Multi-page structured.
The web is extremely versatile. Static blogs, e-commerce websites, desktop-level software, all of those different scaling things should all be the first-class citizens of the web family. MPA might have things like “bfcache API”, navigation transitions to catch up the SPA in the future, but it is not today certainly.
PWA is awesome No Matter What
Hey, I am not overblowing it. Even we as a Multi-page PWA couldn’t be as stunning and app-like as many Single Page PWAs are. The idea and technologies behind PWA still help us deliver a much better experience to our users on the web that hasn’t been possible before.
Finally, I’d love to thank:
- my colleagues YiSi Wang, GuangHui Ren, JiyinYiyong from Eleme
- Michael Yeung, Liam Spradlin, and other collaborators from Google
- collaborators from UC/Tencent
And special thanks to
Thank you all!
Appendix. Architecture Diagram
The PRPL Pattern | Web | Google Developers
For most real-world projects, it's frankly too early to realize the PRPL vision in its purest, most complete form - but…
Preload, Prefetch And Priorities in Chrome
Today we’ll dive into insights from Chrome’s networking stack to provide clarity on how web loading primitives (like…
As web developers, we know how easy it is to end up with web page bloat. But loading a webpage is much more than…
Tasks, microtasks, queues and schedules
When I told my colleague Matt Gaunt I was thinking of writing a piece on microtask queueing and execution within the…