Use GFE Server-Timing Header in Cloud Spanner Debugging

Yang Song
Yang Song
May 27, 2020 · 4 min read

Overview

GFE Server-Timing header has recently been enabled in all Cloud Spanner APIs. That means Spanner users can now retrieve the latency metric on GFE for every Cloud Spanner response. It can help to better distinguish the source of latency between client, networking, GFE and Spanner backend.

Background

  • Today Cloud customers can see the latency metrics and tracing spans of client RPCs by using OpenCensus in the client libraries. (See OpenCensus guide.)
  • Latency on Spanner requests is automatically captured and exposed as a Stackdriver system metric “spanner.googleapis.com/api/request_latencies”.
  • In addition, Spanner QueryStats are available in the responses if QueryMode is set to Profile. It includes the query execution elapsed time.
  • However, there is still a missing piece in the latency breakdown, which is the latency on Google Front End (GFE). Without GFE latency, it is difficult to debug high latency issues, because there could be multiple causes (for example, client application is too far away from GFE) but there is not enough information to differentiate.

Format

Now for every response from Cloud Spanner, no matter whether it’s REST or gRPC, there will be an additional header that contains the total elapsed time on GFE. The format is

server-timing: gfet4t7; dur=[GFE latency in ms]

Users can make use of the GFE latency information in various ways.

One-off Debugging

The simplest way is to debug queries on the web browser. The example below uses Chrome developer tools:

If users are more used to curl or gcloud CLI, they can also inspect GFE latency in REST responses from command line:

Debugging Programmatically

Next let’s examine how to make use of this header programmatically. We will use Spanner Go, Java and Python clients in the following examples.

Go

In the Go client, simply supply an gRPC Header CallOption when calling Spanner APIs. It will capture headers for an outgoing RPC:

See gRPC metadata documentation for more details.

Java

In the Java client, users need to implement the ClientInterceptor interface and intercept the outgoing RPCs in order to retrieve the headers:

Then when instantiating the Spanner client, supply the HeaderClientInterceptor:

Python

In the Python client, things are a bit more complicated. That’s because Spanner Python APIs is wrapped by google-api-core, and google-api-core doesn’t expose a client option to manipulate headers. To bypass this constraint, we need to call the raw Proto-generated Spanner APIs directly, then retrieve the headers from call.initial_metadata:

Aggregation, Visualization and Debugging

After retrieving the GFE latency metrics, advanced users can export, aggregate and visualize the latency data on monitoring tools, and compare client application latency, GFE latency and Spanner query execution latency side-by-side:

(This example uses Spanner Go client, OpenCensus and Prometheus. The code snippet can be found here.)

Alternatively, users can link the GFE latency as an attribute to Trace Spans:

(This example uses Spanner Go client, OpenCensus and Jaeger. The code snippet can be found here.)

By comparing the latencies between client, GFE and Spanner query execution, users and support engineers should now be able to narrow down the cause of slow Spanner operations. For example,

  • If the query execution latency is high, users can focus on optimizing schema, indexing and the query for better performance.
  • If GFE latency is high while Spanner latency is low, likely there is some issue with GFE. Users can include this information in the support tickets so that support engineers can focus on GFE troubleshooting.
  • If the client-side latency is high while GFE and Spanner latencies are low, there can be a networking issue between the client and the regional GFE. Users may need to fix their network routing.

References

W3C Server-Timing Specs: www.w3.org/TR/server-timing/

Google Front End (GFE) Design Overview: cloud.google.com/security/infrastructure/design#google_front_end_service

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Yang Song

Written by

Yang Song

Software Engineer at Google

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store