Val Deleplace
Jun 18, 2018 · 8 min read

I ❤ waterfall latency charts!

Backend API calls latency, while handling 1 web request

What I call a “waterfall” is a bar diagram like this sketch, where the X axis is the elapsed time, and each span is an activity that starts at a specific time, and has a duration. A waterfall provides with a great instant insight of the performance of a compound process: Where was the time actually spent?

Today we’ll have a look at the server-side latency of a web request to a backend running on Google App Engine Standard.

Let’s build a simple dynamic front page for a Condor Observation Contest in 2023 (yay, the future):

The website, deployed on App Engine Standard

From the server point of view, the page involves 4 static elements:

  • the skeleton of the HTML document, with title
  • a picture of a condor
  • a CSS file
  • a favicon ;

and 3 dynamic elements :

  • the definition of “Condor”, retrieved with a REST call to a Wikipedia service
  • the current winner of the contest, and the details of their discovery, retrieved from the contest database : a Cloud SQL instance (MySQL flavor)
  • the page view counter, that we programmatically read from the Datastore.

In this example, all the data of the dynamic elements are fetched server-side ; Then the backend uses a template to generate the HTML document to be sent to the browser.

The page view is very early in 2023, only a few observations have been collected, and the winning observation so far features 7 condors! Only the front page rendering with the ongoing high score is detailed here, not how to the data was fed into the database by the contestants (that’s a different process).

Here is the source code of the app. I opted for the Go runtime, but this is not important here because the total latency of the request handling is dominated by the latency of the API calls.

The same insights apply to all App Engine standard runtimes, so feel free to pick your favorite one among :

..and now also NodeJS !!

Regardless of the chosen language, here is what the waterfall looks like in Stackdriver Trace. To find a trace in your own project, open the Cloud Platform web console, go to Logging > Logs, click on the latency of a request, then View trace.

Click to zoom

Only the API calls are traced (blue spans). The internal processing of the incoming request by the GAE instance (i.e. all your code that is not an API call) is not traced here, just the total duration.

This trace (above) is observed with an early version of the app, where the API calls are performed sequentially. But it turns out that we can optimize this: As the 3 pieces of information (definition, winner, pageviews) are independent from each other, we can fetch them concurrently (see this commit)

Concurrent API calls reduce the total latency

Wow, this is a 3x speedup! The request was handled in 246ms, instead of 773ms. You can see some overlaps of the blue spans.

This includes only the server time, not the network time between the user and the Google infrastructure. Your mileage may vary : the processing time is sometimes substantially shorter or longer for no obvious reason, even when it’s not a loading request (cold boot).

Two days later, lots of observations have been reported by contestants, and a new record has been set:

This time I have to wait for 3 seconds, that’s a bit slow.

By the end of January, the record is even higher:

But to view the page, we’re waiting for a prohibitive 22 seconds :

In March, the page is down: Server time-out.

Ouch: Broken page

So what’s going on here? In the recent waterfalls, the span durations are obviously dwarfed by the huge total latency, which remains unexplained.

If you take a closer look at the blue spans, you’ll find a urlfetch call to Wikipedia, and a few Datastore calls that read and write the pageviews counter atomically (in a transaction). But oddly enough, nothing about the Cloud SQL queries. This actually works as documented :

Stackdriver Trace works with all App Engine APIs, with the exception of Cloud SQL.

Stackdriver Trace documentation

If we log a message before and after each data retrieval, we quickly spot the culprit (e.g. on the Jan 3rd and Jan 29th waterfalls):

The time elapsed between the start and the end of “Querying observation winner in Cloud SQL” account for almost 100% of the total time.

Logging custom messages is easy and useful. And the waterfall viewer displays them, which is nice.

We inspect the source code of the database access layer, and it turns out that we’re making way more SQL queries than necessary :

  • SELECT id FROM condors.observation WHERE YEAR(date)=?
  • For each id,
    SELECT id, date, region, user, nbcondors
    FROM condors.observation
    WHERE id=?

This means that if we have N observations in the database, we end up making (1+N) SQL queries. Each query incurs a (pretty high) overhead in order to make a network round-trip to the DB, and have the DB process the query and return the result rows. In the app code, the loop is sequential.

When the DB contains 100 observations (early January), the page is slow.

When the DB contains 3000 observations (in March), the server gives up.

Obviously we could and should do better. Don’t laugh at this contrived, naive example! Real, complex backends with ORM or programmatically built SQL queries do have this problem. It is non-trivial to solve in general (one size doesn’t fit all), and not even easy to identify in the first place. Remember, the SQL calls don’t show up in the Stackdriver Trace viewer.

Here are 3 different strategies to solve the sluggishness problem:

  • Maybe launch the SQL queries concurrently? Well that wouldn’t change the fact that we’re making thousands of them, incurring a huge accumulated overhead. Not a good option.
  • Retrieve all the observations of year 2023 in a single SQL query? This turns out to remove the bottleneck, which is cool. But we’re still loading thousands of rows from the DB, which is overkill and could lead to memory footprint problems, and oversized network payloads.
  • The best solution is to offload the winner selection code from the app onto the DB engine:
    SELECT id, date, region, user, nbcondors
    FROM condors.observation
    ORDER BY nbcondors DESC, date ASC LIMIT 1

As a rule of thumb, the overhead of a query is so big, and the DB engines are so optimized, that it’s often a good idea to offload the heavy lifting onto the DB, in order to minimize the number of queries (very important) and the number of returned rows.

It would be nice if Stackdriver Trace did support Cloud SQL calls out-of-the-box, but it currently doesn’t (as of June 2018).

As a workaround, one would like to be able to insert custom spans programmatically, but this part depends on the chosen runtime/language and available libraries. Lots of rough edges and alpha tools in this area! There are Client libs, but they are not for App Engine Standard. If you use Java, try this. Otherwise, have a look at the REST API.

Take-aways for a responsive backend!

  • Be frugal, response times matter. Don’t fetch more than necessary.
  • Waterfalls are a really great visualization tool
  • When in doubt, measure and log. The bottlenecks are often not what your intuition expects!
  • Otherwise, measure and log. The bottlenecks are often not what your intuition expects!
  • Try to minimize the number of API calls: DB requests, external HTTP, etc.
  • Less critically, try to minimize the payload of your requests (e.g. select only the rows and columns that you need).
  • It’s sometimes a good trade-off to retrieve too much data, but in fewer API calls.
  • Use Memcache! It is faster than other services.
  • Batch API requests when possible. 1 request for 20 objects is much more efficient than 20 requests for 1 object.
Batching greatly reduces latency
  • Consider launching several independent API calls concurrently instead of sequentially.
Concurrency may reduce latency
  • Consider deferring work (with Task Queues) to be executed after the response has been sent to the end user. That notification email can wait a little, if it saves 500ms of responsiveness to your form processing!
Defer what’s not essential to the HTTP response
  • Don’t try to stream the response: it’s not possible on GAE Standard. The full response is buffered, and then sent to the client.
  • Be aware of what is traced by your system and what isn’t: external API calls, network I/O, filesystem, memory allocation, garbage collection, locks… The best granularity is one where most of the latency can be explained in terms of smaller spans that we can make sense out of. More observability information is usually better.

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Thanks to Guillaume Laforge

Val Deleplace

Written by

Engineer on cloudy things @Google. Opinions my own. Twitter @val_deleplace

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade