Not so micro optimizations
People had various opinions about the AMP Project, but the one thing I haven’t heard is that it wasn’t fast. We, however, aren’t even close to done yet and hope to make AMP significantly faster over the next months. A few weeks into our developer preview launch, we already landed a few nice optimizations. This post will take a closer look at some of them. Even for readers not using AMP some of them might be interesting to apply to their own projects.
While one of the core techniques of AMP is to delay fetching of resources until when they are needed, we preconnect to hosts as early as possible to make the fetches fast when they are needed.
Preconnecting executes DNS lookup, TCP and SSL handshake. This saves 100+ms even on WIFI and can be seconds faster on a crappy mobile connection.
Unfortunately only Chrome supports preconnect natively, but we developed a super simple polyfill for other browsers:
What’s going on?
- We construct a URL on the host and add a random number to bust the cache, because we actually want to make a connection.
- We make an XHR. ‘HEAD’ as a request method is great, because we don’t care about the return document. Also we expect to get a 404 anyway.
- Finally we send the request. Note, that this is always a cross origin request (we never preconnect to ourselves) and we don’t call “withCredentials(true)”. This means we never send cookies with the request, nor a referrer, nor can the response set cookies which is exactly what we want in this case.
So, in case you see “amp_preconnect_polyfill” in your server logs: Yep, that is us. In practice we only preconnect to big hosts like Twitter, YouTube or ad networks where these extra requests hopefully don’t get noticed :)
Prefetching is as easy as:
Unfortunately, Safari doesn’t support it either. So far we haven’t been able to come up with a good polyfill for prefetch. A few people claim to have built one on the internet, but none of them actually work as far as we can tell.
This simple “polyfill attempt” gets us super close. It will prefetch the URL and when it is actually fetched, it is served from cache. However, next time, when the resource is cached, but it wasn’t an image, it will fetch the resource again which wastes bandwidth.
If anyone has a better way to do it in Safari we’d be super grateful. For now we have no prefetching for iOS users.
Tuning Babel ES6 polyfills
Optimizing style recalculations
With some simple rearrangement we reduced the number of style recalculations while loading a typical AMP document from 4 to 2. This may not sound like much, but when you prerender 3 documents this brings down the work from 12 to 6 which can be a world of a difference.
The 2 times where we need to recalculate styles are:
- When we measure for each AMP elements (and that might be many, of course) how large it can be based on container size.
- After initial layout with changes that took into account #1 we measure the height of the doc.
It is not inconceivable that these 2 phases could be collapsed into a single one. Would be a nice rainy afternoon project :)
The most common way to reduce style recalculations is to batch DOM operations into reads and writes through a library like fastdom. On top of this one can do more “application level” batching and realignment of operations. It ends up being a very fragile state, though. It is super easy to regress and to add back a style recalc through super subtle changes. We are working with the Chrome team to add APIs for use in unit tests that can be used to assert that a certain number of style recalculations is not exceeded.
Sharing JS across sandbox iframes
AMP sandboxes all third party JS in cross-origin iframes. This has several performance benefits but also comes with significant memory overhead. A future project is to keep that extra memory usage constant, but for now we’ve focused on keeping the individual frames efficient.
Whichever is the first third party sandbox iframe on a page declare themselves as “master iframe” and then the second and further frames try to find that master in the parent:
This way we always have one special iframe that is the “master iframe”. This allows easy sharing of resources across iframes. Work can be done only once and reused by all the iframes that need it. One such thing is the Twitter embedding script. This is how we load it (And yes, this code looked nicer before we god rid of Promises in our embed frame):
This function loads the Twitter code and calls a callback when it is done. Real work is only performed in the master. All the other frames just wait for the primary work to get done and then get called back.
Friends at Instagram, DoubleClick and elsewhere: Please help us out; there is a lot of CPU & RAM to not be used :)
Unfortunately this optimization only works when the embed code is designed for it. Twitter is one of the rare exceptions that work fine in this context. Friends at Instagram, DoubleClick and elsewhere: Please help us out; there is a lot of CPU & RAM to not be used :)
We have several projects under way to further reduce our JS size. On top of this we will make font loading easier to control in AMP and then will start leveraging Service Workers for more predictable performance in supported browser. And yeah, maybe we‘ll even utilize App Cache in browsers that don’t yet support Service Workers, so you don’t have to.