Server Timing in the Wild

How we expose web performance in iPlayer

Understanding how long the BBC iPlayer website takes to load has always been a difficult challenge which has a real impact on the audience; luckily browsers now have a tremendous amount of features to help us figure out what’s going on. Developer tools are especially effective at profiling and monitoring browser performance but lack the means to adequately show where time is spent before a response is received from the server; that’s where Server Timing steps in to save the day.

For those who haven’t met Server Timing, it’s been in development for awhile but the official W3C version recently arrived in Chrome 65 which was released last week on the 6th of March. Server Timing is a set of headers which you can add to your application responses to produce timing data in the browser’s developer tools to help demystify the time to first byte (TTFB), such as the following example from the iPlayer homepage:

Server-Timing data for /iplayer in Chrome

The above screenshot is from the iPlayer website in Chrome Dev Tools when inspecting the /iplayer request in the Network tab and the request’s Timing tab. It shows that whatever request processing took place in our application, it accounted for 68.53ms out of a total of 101.52ms before the first byte occurred. This means only 32.99ms was unaccounted for and that amount can easily be attributed to the magic of the internet.

If you look at the spec, there are a lot of options as to how you implement this, instead of worrying about all of the options we just took the simplest variant to start with to begin surfacing the data. For that you just need a Server-Timing header per timing as seen on the iPlayer website:

Server-Timing headers from /iplayer

As shown, each Server-Timing header has two pieces of information, a name before the ; and an attribute of dur which is short for duration. Take care not to expose too much information about what these values actually mean and be wary of setting a description using the desc attribute for any values as this is all exposed in the response. For iPlayer, this means we only use the label which is first part of the header and the duration attribute which is formatted in milliseconds.

Performance Timing API Data

The other cool thing, is because Server-Timing is a supported W3C spec, browsers will start supporting it within the performance timing API so that it will be available for client side monitoring of applications. In the above output, you can see the three values we discussed previously being exposed within the navigation entries but it is also possible to add Server Timing to the resource entries which means you can get more information about things coming in after the initial load.

How we did it

In iPlayer, we use Express and Node.js (for more information about our tech you can read an excellent post by Richard Wong), so we implemented this as a middleware. To start with we used the high resolution timer available in Node.js:

const startTime = process.hrtime();
const timeDifference = process.hrtime(startTime);

This allows us to tell the time difference between two points in our code, this is also useful for creating benchmarks and other time measurement tools. The process.hrtime function takes the startTime passed in and produces the difference between that timestamp and the current timestamp to be stored in timeDifference in the format [seconds, nanoseconds] . To turn this into milliseconds we do a bit of Math:

function convertToMs(hrtime) {
const ms = hrtime[0] * 1e3 + hrtime[1] * 1e-6;
return ms.toFixed(3);
}

Taking the seconds and multiplying them by 1000 to give the millisecond value and the equivalent to convert the nanoseconds into milliseconds then limiting it to 3 decimal places. All that was left was to wire it altogether:

const onHeaders = require('on-headers'); 
function convertToMs(hrtime) {
const ms = hrtime[0] * 1e3 + hrtime[1] * 1e-6;
return ms.toFixed(3);
}
module.exports = function serverTiming(req, res, next) { 
const startTime = process.hrtime();
onHeaders(res, () => {
const timeDifference = process.hrtime(startTime);
const timeDifferenceMs = convertToMs(timeDifference);
res.append('server-timing', `total;dur=${timeDifferenceMs}`);
});
next();
};

To measure any kind of route in Express you want to measure from when the route is first called to when you finally respond to a request. To do this we couldn’t find any standard Express hook so instead used the on-headers npm module which allows you to intercept a response just before it’s sent. Importantly we used res.append rather than res.set which appends to existing values so if you set Server-Timing headers elsewhere they accumulate.

Taking this a bit further

We extended upon this idea a little bit by having two maps that contained our runningTimers and finishedTimers so that we could easily add more timings if necessary as well. These were accessible using two methods added to the res object as serverTimingStart and serverTimingEnd which meant we could measure any piece of code we wanted such as:

res.serverTimingStart('data');
const data = await dataProcessing();
res.serverTimingEnd('data');

This then gets appended to the response as an additional piece of Server-Timing data. And whilst we were there we wanted to add these stats into our existing monitoring systems, to do this we made the above code slightly more complicated by adding a hook function which you can optionally pass into the middleware and which is called alongside res.append and is passed req , key and value so it can send stats related to that request:

express.use(timing((req, key, value) => {
sendMyStats(req.url, key, value);
}));
If you’re interested in quickly adding Server-Timing to your express app, try out the express-simple-timing module!

But there are so many other fun uses

If you have a cached response it’s still useful to see how long it took for your server to prepare it so you know how long that poor person without a cached response had to wait.

Server-Timing with Cache information

By adding cache information to your responses you can help inform developers whether they need to be concerned with certain values. You could also filter out the values if you don’t want those values available. Below is an example for use in the Varnish cache server using the header vmod:

vcl 4.0;
import header;
sub vcl_deliver {
if (obj.hits > 0) {
header.append(resp.http.server-timing, "hit");
} else {
header.append(resp.http.server-timing, "miss");
}
}

Or even simpler using some of the default nginx server cache variables:

add_header server-timing $upstream_cache_status;

Another example from the W3C spec is to add information about which data centre the response was routed through. This is actually not as useful as you might think as it instead appears like this in Dev Tools:

Server Timing description in Chrome Dev Tools

The above was produced with a Server Timing header of dc;desc=gb which is represented as just the string gb . But you can still access the information in full using the performance API.

Final Thoughts

Taking some time to add these headers into your application and caching layers means that developers can easily see why server responses are slowing down. Further headers can be added at multiple points in the chain to further inform as to where your requests are being routed and what is taking that vital time to first byte.

Many thanks to Harry Roberts who mentioned a lack of information about Server Timing and it’s use in the wild during our workshop with him!

Join us

If you’re interested and excited by what we’ve talked about here, then you might be interested in our joining our Web team. We’re always looking for driven engineers to join us.