Browsers!

Alexey Migutsky

Published in

Full Stack Engineering

4 min readDec 19, 2014

You are caching it wrong!

I’ve developed the chrome extension to enhance the caching and put the described idea into life.

The code of the extension is available on the github.

I believe this simple thing can be a game changer if it is implemented on browser level.

I would pay money for the mobile browser, which incorporates this idea, because it will load pages perceivably faster and it will cut down on energy bills for companies hosting those pages!

But let put everything in the order…

What’s wrong with caching?

Do you know, how browser cache is working?

I mean, do you really know, how it is working and how to control the behaviour?

To be honest, I doubt your answer will be “Yes! It’s pretty simple”.

I’ll give you a couple of points to wonder, but will not dive into details:

What are the rules for browser to send the cache-checking requests?
Do you know that each time a user refreshes the page those nasty browsers send a blocking request to check the validity of the resource? (check the link above)
Do you know how to implement effective caching for your app?
Are you sure that CDNs are doing it right?

What’s wrong with CDNs?

CDNs are great!

The idea of having a single resource cached forever is brilliant.

But it is spoiled by the browsers being sending those “lets check it just in case” requests.

Each “304 request” can add hundreds of milliseconds for page loading time! (Yes, enterprise apps, I am looking at you!)

And users are pretty damn good to fix their problems by “simply refreshing the page” or “just deleting the cache”.

Why do you hate those requests?

The idea of checking the resources on refresh is not insane. It has it’s purpose.

But I believe that resources should be treated differently by the browser.

Why do you need to check the jquery library script bundle, coming from the CDN, on each refresh? (Yes, this is the clue)

The main problem I see is the scale.

Each request hitting a server costs real money and pollutes the atmosphere, because energy is spent on delivering the data and computing the response.

And energy is still mostly produced by burning things.

You may wonder: “Who cares? It is like a sand grain! The cost is negligible”

But can you imagine million send grains?

And there is definitely a million of users refreshing their browsers each second!

That’s how we are losing money and burn things in global scale!

Can we really do something with it?

Yes, we can cut down the number of the requests!

The answer is “artifacts”.

And by artifacts I mean those building blocks we are using to create websites: css libraries, javascript libraries, web fonts.

This is the same idea as CDNs, but put on steroids.

Why do we need to host those artifacts on servers and still compute the responses for “304 requests” if we can store those artifacts in browsers?

Won’t this idea break things?

Well, there is a definite chance, but it is pretty low.

To be considered an artifact, the resource should have these properties:

It should be identifiable.
It should be immutable.

Let’s see an example.

Here is a list of jQuery artifacts: http://code.jquery.com/jquery/

Each artifact is identified by it’s name and version, like jquery-2.1.3.min.js

Each artifact is immutable in the sense that it will never be changed after being released and tagged by the name+version pair.

And what does it give us?

I would say we have an invariant:

If the script called “jquery-2.1.3.min.js” is required for the web page, we can definitely say, that this is the same script as “jquery-2.1.3.min.js” from the artifact storage.

The same holds true for the css libraries and webfonts files.

If we will use that invariant in the cache, we can skip checking and downloading the artifact from the server, because the local artifact is guaranteed to be the same!

Hey, but someone can modify the script!

Well, technically, yes.

But there are some known conventions and rules which give us the ability to specify 3 levels of safety (trading off efficiency):

Safest level. We can use URLs to identify artifacts and rely on CDNs as a guaranteed source.
In this case the artifact is called “http://code.jquery.com/jquery-2.1.3.min.js”
Note: only the pages, where common CDN artifacts are used, will be enhanced.
Safe level. We can use name + version + min.js-suffix to identify the artifact.
In this case the artifact is called jquery-2.1.3.min.js
Note: only the pages, which follow best practices, will be enhanced.
Not-so-safe level. We can use name + version to identify the artifact.
In this case the artifact is called jquery-2.1.3
Note: this case covers most pages, but it is the most risky one.

Ok, but there are million of artifacts! What shall we use?

To get the answer we need to analyse the artifacts usage.

I have made a research, and the info from libscore can be very useful.

It turns out, that there are plenty of popular libraries in the wild, with jQuery being the most used.

I would say we shall use all the artifacts one can find in popular CDNs and some most used jquery plugins.

Sounds good. How can I use that?

If you are a browser developer I would suggest you to consider implementing this idea natively in the browser.

If you are a chrome user, I would suggest you to join 1500+ people using the extension and feel the speed gain yourself.