A Web Developer’s Guide to Browser Caching

Overview

Caching is a useful yet surprisingly complex feature of web browsers.

In this article, we’ll explain the how the browser uses its cache to load pages faster, which factors determine cache duration, and how we can bypass the cache when necessary.

Why is Caching Important?

All browsers attempt to keep local copies of static assets in an effort to reduce page load times and minimize network traffic.

Fetching a resource over a network will always be slower than retrieving it from local cache. This is true whether the server is on the same network or it’s located on the far side of the world.

How Browser Caching Works

Case 1: User has not visited the site before

The browser won’t have any files cached for the site so it will fetch everything from the server.

Below is a snapshot of the resources downloaded when visiting the Wikipedia home page for the first time. The status bar at the bottom shows that 265KB of data was transferred to the browser.

Case 2: User has visited the site before

The browser will retrieve the HTML page from the web server but consult its cache for the static assets (JavaScript, CSS, images).

We can see the difference cache makes when we refresh the Wikipedia page:

The data transferred went down to 928 bytes — that’s 0.3% the size of the initial page load. The Size column shows us that most of the content is pulled from cache.

Chrome will pull files from either memory cache or disk cache. Since we didn’t close our browser between Cases 1 & 2, the data was still in memory cache.

Show the Browser Cache

In Chrome, we can go to chrome://cache to view the contents of the cache. This will display a page of links to a detailed view for each cached file.

How Does the Browser Know What to Cache?

The browser inspects the headers of the HTTP response generated by the web server. There are four headers commonly used for caching:

  • ETag
  • Cache-Control
  • Expires
  • Last-Modified

ETag

The ETag (or Entity Tag) is a string that serves as a cache validation token. This is usually a hash of the file contents.

The server can include an ETag in its response, which the browser can then use this in a future request (after the file has expired) to determine if the cache contains a stale copy.

If the hash is the same, then the resource hasn’t changed and the server responds with a 304 response code (Not Modified) with an empty body. This lets the browser know it’s still safe to use the cached copy.

Note that ETag is only used in requests whenever the file has expired from cache.

Cache-Control

The Cache-Control header has a number of directives we can set to control cache behavior, expiration, and validation. These can be combined together as well.

Cache Behavior

Cache-Control: public

public means that the resource can be cached by any cache (browser, CDN, etc)

Cache-Control: private

private means that the resource can only be cached by the browser

Cache-Control: no-store

This tells the browser to always request the resource from the server

Cache-Control: no-cache

This one is actually a bit misleading. It doesn’t mean “do not cache”.

This tells the browser to cache the file but not to use it until it checks with the server to validate we have the latest version. This validation is done with the ETag header.

This is commonly used with HTML files since it makes sense for the browser to always check for the latest markup.

Expiration

Cache-Control: max-age=60

This specifies the length of time in seconds the resource should be cached.So a max-age=60 means that it should be cached for 1 minute. RFC 2616 recommends that the maximum value for should no longer than 1 year (max-age=31536000).

Cache-Control: s-max-age=60

This is only used by intermediate caches like a CDN.

Validation

Cache-Control: must-revalidate

This tells the cache it must verify the status of the stale resource before using it and expired ones should not be used.

Expires

The Expires header is from the older HTTP 1.0 days but is still used on many sites.

This header field provides an expiration date after which the asset is considered invalid.

Expires: Wed, 25 Jul 2018 21:00:00 GMT
The browser will ignore this field if there’s a max-age directive in Cache-Control

Last-Modified

The Last-Modified header is also from the HTTP 1.0 days.

Last-Modified: Mon, 12 Dec 2016 14:45:00 GMT

This field contains the date and time the resource was last modified.

HTML Meta Tag

Prior to HTML5, using meta tags inside HTML to specify cache-control was a valid approach:

<meta http-equiv="Cache-control" content="no-cache">

Using a meta tag like this is now discouraged and is not valid HTML5. Why? It’s not a good idea because only browsers will be able to parse the meta tag and understand it. Intermediate caches won’t.

So always send caching instructions via HTTP headers.

HTTP Response

Let’s take a look at an sample HTTP response:

Accept-Ranges: bytes
Cache-Control: max-age=3600
Connection: Keep-Alive
Content-Length: 4361
Content-Type: image/png
Date: Tue, 25 Jul 2017 17:26:16 GMT
ETag: "1109-554221c5c8540"
Expires: Tue, 25 Jul 2017 18:26:16 GMT
Keep-Alive: timeout=5, max=93
Last-Modified: Wed, 12 Jul 2017 17:26:05 GMT
Server: Apache
  • Line 2 tells us that the max-age is 1 hour
  • Line 5 tells us that this is a PNG image
  • Line 7 shows us the ETag value which will be used for validation after the 1 hour mark to verify that the resource hasn’t changed
  • Line 8 is the Expires header which will be ignored since max-age is set
  • Line 10 is the Last-Modified header which shows when the image was last modified

Caching Pitfalls

So we’ve established that browser caching is awesome, and we should take advantage of it.

But we also want users see the latest version of our page when we make updates. We can’t expect them to do a hard refresh (Ctrl-F5) every time they visit our site or clear their cache regularly.

These types of caching issues are often a source of frustration for both the developer and end-user. A user may see a broken page or a button that behaves strangely because they have an outdated stylesheet or JavaScript code.

Stale Files

Below is a Twitter exchange between Chase Support and a user having issues with a login form on the banking site.

The user likely had some old JavaScript cached in their browser which caused the form to reset instead of submit when the Logon button was clicked.

Let’s explore another situation where stale files could bite us.

Suppose we fix a bug in a JavaScript file called app.min.js and push the update to our production site.

This is what our HTML looks like:

<script src="assets/js/app.min.js">

Our web server sets the max-age of JavaScript files to 1 week (604,800 seconds).

Cache-Control: private, max-age=604800

After the update, some users report they are still having issues symptomatic of the bug.

What’s going on here?

  • Bob visited the site 2 weeks ago and has a cached copy of buggy app.min.js. Since his copy is older than max-age, the browser will retrieve the file from the server, and he gets the latest bug-free version.
  • Mary visited the site 2 days ago and also has a cached copy of buggy app.min.js. Her copy is newer than max-age so her browser is still happily using the cached copy.

In the next section, we’ll see how to prevent these issues with a technique called cache busting.

Cache Busting

Cache busting is where we invalidate a cached file and force the browser to retrieve the file from the server.

We can instruct the browser to bypass the cache by simply changing the filename. To the browser, this is a completely new resource so it will fetch the resource from the server.

Cache busting also allows us to keep long max-age values for resources that may change frequently. Google recommends that max-age be set to 1 year (source).

Versioning

We could add a version number to the filename:

assets/js/app-v2.min.js

Fingerprinting

We could add a fingerprint based on the file contents:

assets/js/app-d41d8cd98f00b204e9800998ecf8427e.min.js

Append Query String

We could append a query string to the end of the filename:

assets/js/app.min.js?version=2

The query string approach has known issues with proxy servers so this method is generally discouraged.

Best Practices

Do

  • Use the Cache-Control and ETag headers to control cache behavior for static assets
  • Set long max-age values to reap the benefits of browser cache
  • Use fingerprinting or versioning for cache busting

Don’t

  • Use HTML meta tags to specify cache behavior
  • Use query strings for cache busting

FAQ

How can I tell if a file was loaded from cache?

Check out the Developer Tools in your browser. In Chrome, this information is shown in the Network tab under the Size column.

How do I prevent caching for a file?

Use the following response header:

Cache-Control: no-cache, no-store, must-revalidate