Not so micro optimizations

People had various opinions about the AMP Project, but the one thing I haven’t heard is that it wasn’t fast. We, however, aren’t even close to done yet and hope to make AMP significantly faster over the next months. A few weeks into our developer preview launch, we already landed a few nice optimizations. This post will take a closer look at some of them. Even for readers not using AMP some of them might be interesting to apply to their own projects.

Preconnect

While one of the core techniques of AMP is to delay fetching of resources until when they are needed, we preconnect to hosts as early as possible to make the fetches fast when they are needed.

Preconnecting executes DNS lookup, TCP and SSL handshake. This saves 100+ms even on WIFI and can be seconds faster on a crappy mobile connection.

Unfortunately only Chrome supports preconnect natively, but we developed a super simple polyfill for other browsers:

What’s going on?

  1. We construct a URL on the host and add a random number to bust the cache, because we actually want to make a connection.
  2. We make an XHR. ‘HEAD’ as a request method is great, because we don’t care about the return document. Also we expect to get a 404 anyway.
  3. Finally we send the request. Note, that this is always a cross origin request (we never preconnect to ourselves) and we don’t call “withCredentials(true)”. This means we never send cookies with the request, nor a referrer, nor can the response set cookies which is exactly what we want in this case.

So, in case you see “amp_preconnect_polyfill” in your server logs: Yep, that is us. In practice we only preconnect to big hosts like Twitter, YouTube or ad networks where these extra requests hopefully don’t get noticed :)

Prefetch

Even better than just preconnecting is to prefetch a resource. One of the reason why we e.g. load ads very late is that loading their JS uses CPU which might jank scrolling, etc. Prefetching doesn’t have the same problems: JavaScript is loaded, but doesn’t run yet which means that if we have bandwidth to spare but aren’t sure about CPU, then prefetching is great.

Prefetching is as easy as:

Unfortunately, Safari doesn’t support it either. So far we haven’t been able to come up with a good polyfill for prefetch. A few people claim to have built one on the internet, but none of them actually work as far as we can tell.

This simple “polyfill attempt” gets us super close. It will prefetch the URL and when it is actually fetched, it is served from cache. However, next time, when the resource is cached, but it wasn’t an image, it will fetch the resource again which wastes bandwidth.

If anyone has a better way to do it in Safari we’d be super grateful. For now we have no prefetching for iOS users.

Tuning Babel ES6 polyfills

We love ES6 and we love Babel, but unfortunately it turns out that some of its polyfills are heavy with respect to JavaScript size. Quite reluctantly we forked the core-js-shim (just one file) and only kept Array.from, Promise and Math.sign out of all the ES6 goodness. Additionally we limited the syntactical ES6 features we use in the project to those that can be transpiled efficiently by Babel and created a custom Babel helpers file for that purpose. Its tough to no longer be able to use stuff like string.endsWith, but the impact on JS size is worth it for a project like ours. Our main JavaScript file size went down from 142 KB (39 KB gzipped) to 134 KB (36 KB) which is not all that impressive percentage wise, but the binary we compile for our sandbox iframe went down from 50 KB (17 KB) to 7 KB (4 KB) — after we decided we could also live without Promises there and another non-Babel polyfill — which makes a big difference.

Our JavaScript files are expected to be often cached (especially when we utilize a Service Worker to guarantee they are), but JS still has to be parsed and executed. Having less of it uses less precious time on the UI thread during initial load.

Optimizing style recalculations

When measuring dimensions in the DOM via JavaScript (such as: How high is this element?) browsers are forced to immediately “materialize” the document, potentially recalculate its layout and reapply styles if the documents changed since the last time that was done. This can be very slow (1–100ms depending on what changed, the document size and the device speed) and it blocks the UI thread.

With some simple rearrangement we reduced the number of style recalculations while loading a typical AMP document from 4 to 2. This may not sound like much, but when you prerender 3 documents this brings down the work from 12 to 6 which can be a world of a difference.

The 2 times where we need to recalculate styles are:

  1. When we measure for each AMP elements (and that might be many, of course) how large it can be based on container size.
  2. After initial layout with changes that took into account #1 we measure the height of the doc.

It is not inconceivable that these 2 phases could be collapsed into a single one. Would be a nice rainy afternoon project :)

The most common way to reduce style recalculations is to batch DOM operations into reads and writes through a library like fastdom. On top of this one can do more “application level” batching and realignment of operations. It ends up being a very fragile state, though. It is super easy to regress and to add back a style recalc through super subtle changes. We are working with the Chrome team to add APIs for use in unit tests that can be used to assert that a certain number of style recalculations is not exceeded.

Sharing JS across sandbox iframes

AMP sandboxes all third party JS in cross-origin iframes. This has several performance benefits but also comes with significant memory overhead. A future project is to keep that extra memory usage constant, but for now we’ve focused on keeping the individual frames efficient.

Whichever is the first third party sandbox iframe on a page declare themselves as “master iframe” and then the second and further frames try to find that master in the parent:

This way we always have one special iframe that is the “master iframe”. This allows easy sharing of resources across iframes. Work can be done only once and reused by all the iframes that need it. One such thing is the Twitter embedding script. This is how we load it (And yes, this code looked nicer before we god rid of Promises in our embed frame):

This function loads the Twitter code and calls a callback when it is done. Real work is only performed in the master. All the other frames just wait for the primary work to get done and then get called back.

Friends at Instagram, DoubleClick and elsewhere: Please help us out; there is a lot of CPU & RAM to not be used :)

Unfortunately this optimization only works when the embed code is designed for it. Twitter is one of the rare exceptions that work fine in this context. Friends at Instagram, DoubleClick and elsewhere: Please help us out; there is a lot of CPU & RAM to not be used :)

Future work

We have several projects under way to further reduce our JS size. On top of this we will make font loading easier to control in AMP and then will start leveraging Service Workers for more predictable performance in supported browser. And yeah, maybe we‘ll even utilize App Cache in browsers that don’t yet support Service Workers, so you don’t have to.