An Unravelling Tale of Performance, Part 3: Time to First Byte

Time to First Byte is a controversial metric, but ours is shit so let’s optimise it anyway.

Matt Perry

Published in

DriveTribe Engineering

6 min readDec 4, 2017

The story so far: Analysis, Render blocking CSS

Time to First Byte (TTFB) is the amount of time between the moment a client requests a resource, and the moment it receives the first byte of it.

Despite Google PageSpeed Insights, Lighthouse and WebPageTest all featuring it as a prominent indication of the performance of a website, it doesn’t follow that a low TTFB automatically leads to a fast experience for the user.

For instance, it would be simple for a server to optimistically ping a quick 200 success response to a client before even resolving a route. Beyond that, it might run into all sorts of turbulence. Slow or failed API calls, poor rendering performance, heavy payload, anything. But because of that initial 200, its TTFB would be considered low even though the overall speed of the request is slow.

While it’s an odd metric, there is research that suggests a high TTFB negatively impacts Google PageRank. So this will negatively effect our SEO, and if there’s a way to quickly and correctly respond to a request sooner, we should take it.

It’s also true that things that contribute to high TTFB, like a needlessly blocking API request, can have knock-on effects for the rest of the load order. Which will result in a degraded user experience.

In this post, I’m going to explain how DriveTribe handles incoming requests and highlight areas for improvement. When a fix can be achieved within a reasonable amount of time, I’ll implement, measure and deploy.

How DriveTribe handles requests

Here’s my first and hopefully last ever flow diagram, that roughly explains how we handle all incoming requests:

There are two major sources of inefficiency here.

1. Client authentication

All requests to our API need to be authenticated. At the start of our request, we check (via cookies) to see if the client is recognised. If it isn’t we generate a UUID.

Later, when request the route’s data dependencies, we first need to check whether the current user is authenticated. A request to /user is fired with this ID. If we receive a 401, we create an anonymous user on the API.

Finally, we are authenticated. We can make remaining data requests.

For first time requests, this process involves two blocking round trips for a performance penalty of ~250ms. This is terrible.

The history and the solution

If it doesn’t look like this was a designed process, that’s because it wasn’t.

Originally, every API request required a registered user. So, if no authToken was found on the client we could simply show a login screen. If a user was registered they’d already be cookied with an authToken so we could run all requests in parallel. Zero blocking requests.

Later, we opened some endpoints up and for reasons that aren’t entirely relevant to this deep dive, we introduced a form of authentication for clients that were logged out in the form of a clientId.

This has been implemented in a way that doesn’t take advantage of the fact that we already know if the client has a clientId and/or authToken.

If we do, we can make our user request in parallel with our other requests. Zero blocking requests.

If we don’t, we can make our anonymous user request without waiting for an error from the /user endpoint. One blocking request.

In both scenarios we remove a whole blocking request. Thanks to the laws of physics, this will definitely result in a lower TTFB.

Because this is a blocking request, it means that whatever we save in TTFB is time saved on the entire request. For every x milliseconds saved, we can deliver the page x ms sooner, and request images and JavaScriptx ms sooner too. It’s not just SEO-benefitting fluff.

Unlike:

2. Resolving all data before returning headers

The second inefficiency is that all data dependencies are resolved as part of the route matching process.

This means that if a valid route is matched, we don’t even know that we can respond with a 200 success header until all its data has been received.

If someone requests, for instance, the homepage, we know for a fact that this is a 200 response as its existence doesn’t depend on the result of a given API call.

We’re still using React Router 2. It provides a match function to resolve routes, which is configured using a route definition object.

These are provided to a function that compiles them into the route config format that React Router 2 accepts. That format allows a getComponent property, which is designed to resolve the React component. However, I’ve hijacked it (always a safe idea) to resolve the route’s data dependencies in parallel.

The (potential) solution

Instead, we want a solution where we can know if a route has been matched before we fetch its data dependencies.

We’ll also need to know, per route, whether a simple match is enough knowledge to respond with a status header.

With the homepage, this is clearly true. / is always going to be a correct address.

However, a route defined aspost(/:id) is only potentially matching route. Whether we respond with a 200 or 404 depends entirely on the response from the API’s post endpoint.

React Router 4 provides a less-opinionated version of match which would allow us to write completely custom data resolution.

However…

Moving the site to React Router 4 would be a serious rewrite. Which in and of itself is a black mark against a refactor that would be good for SEO but ultimately wouldn't make any difference to overall load times or to the end user.

Crucially, React Router 4 doesn’t support server-side rendering and code splitting out of the box. That’s a deal-breaker.

It remains an improvement to keep in mind for the future. But it’s not for today.

The ideal flow

Dammit. This is my last flow diagram. For posterity, here’s how the server response flow would look in an ideal situation:

While we can’t respond with a status header before retrieving data dependencies, we can still refactor the user authentication flow.

In ideal situations, the user object will now be retrieved in parallel to the rest of the data. The trade-off being if that request returns an error, we need to fetch an anonymous user and re-run all the data requests. This should be an edge case, however.

In the majority of cases we’ll be down to either zero or one blocking user requests.

Results

By moving the anonymous user request upfront for first time users, and changing the recognised user request from blocking to parallel, we managed to make some serious savings.

Here’s the mean timings, from 50 samples, between the moment a homepage request is received, and the moment we’ve resolved all the data and components needed to render it:

Before

New users: 406ms
Returning users: 370ms
Registered users: 357ms

After

New users: 304ms (-100ms)
Returning users: 282ms (-90ms)
Registered users: 268ms (-90ms)

Again this was only 50 samples (with some noise removed), but it paints a clear picture: A saving of between 90 and 100ms, roughly 25% of the previous routing time, for every user.

As this optimisation has a knock-on effect for every subsequent action and request, everything will happen 90-100ms sooner.

And, because the homepage currently depends on a whopping seven endpoints, excluding user data, we can assume that these savings are an even greater saving, relatively, for lighter routes.

We currently request dozens of images from the initial HTML payload, whether or not they reside in the viewport. This clogs up a user’s bandwidth for assets that are crucial to render the initial view.

In the next part, we’ll look at strategies for deferring the request and rendering of images and even components until they’re visible to the user.