GFE Server-Timing header has recently been enabled in all Cloud Spanner APIs. That means Spanner users can now retrieve the latency metric on GFE for every Cloud Spanner response. It can help to better distinguish the source of latency between client, networking, GFE and Spanner backend.
- Today Cloud customers can see the latency metrics and tracing spans of client RPCs by using OpenCensus in the client libraries. (See OpenCensus guide.)
- Latency on Spanner requests is automatically captured and exposed as a Stackdriver system metric “spanner.googleapis.com/api/request_latencies”.
- In addition, Spanner QueryStats are available in the responses if QueryMode is set to Profile. It includes the query execution elapsed time.
- However, there is still a missing piece in the latency breakdown, which is the latency on Google Front End (GFE). Without GFE latency, it is difficult to debug high latency issues, because there could be multiple causes (for example, client application is too far away from GFE) but there is not enough information to differentiate.
Now for every response from Cloud Spanner, no matter whether it’s REST or gRPC, there will be an additional header that contains the total elapsed time on GFE. The format is
server-timing: gfet4t7; dur=[GFE latency in ms]
Users can make use of the GFE latency information in various ways.
The simplest way is to debug queries on the web browser. The example below uses Chrome developer tools:
If users are more used to curl or gcloud CLI, they can also inspect GFE latency in REST responses from command line:
Next let’s examine how to make use of this header programmatically. We will use Spanner Go, Java and Python clients in the following examples.
In the Go client, simply supply an gRPC Header CallOption when calling Spanner APIs. It will capture headers for an outgoing RPC:
See gRPC metadata documentation for more details.
In the Java client, users need to implement the ClientInterceptor interface and intercept the outgoing RPCs in order to retrieve the headers:
Then when instantiating the Spanner client, supply the HeaderClientInterceptor:
In the Python client, things are a bit more complicated. That’s because Spanner Python APIs is wrapped by google-api-core, and google-api-core doesn’t expose a client option to manipulate headers. To bypass this constraint, we need to call the raw Proto-generated Spanner APIs directly, then retrieve the headers from call.initial_metadata:
Aggregation, Visualization and Debugging
After retrieving the GFE latency metrics, advanced users can export, aggregate and visualize the latency data on monitoring tools, and compare client application latency, GFE latency and Spanner query execution latency side-by-side:
(This example uses Spanner Go client, OpenCensus and Prometheus. The code snippet can be found here.)
Alternatively, users can link the GFE latency as an attribute to Trace Spans:
(This example uses Spanner Go client, OpenCensus and Jaeger. The code snippet can be found here.)
By comparing the latencies between client, GFE and Spanner query execution, users and support engineers should now be able to narrow down the cause of slow Spanner operations. For example,
- If the query execution latency is high, users can focus on optimizing schema, indexing and the query for better performance.
- If GFE latency is high while Spanner latency is low, likely there is some issue with GFE. Users can include this information in the support tickets so that support engineers can focus on GFE troubleshooting.
- If the client-side latency is high while GFE and Spanner latencies are low, there can be a networking issue between the client and the regional GFE. Users may need to fix their network routing.
W3C Server-Timing Specs: www.w3.org/TR/server-timing/
Google Front End (GFE) Design Overview: cloud.google.com/security/infrastructure/design#google_front_end_service