Reduce latency with Stackdriver Trace

Colt McAnlis
Oct 17, 2018 · 3 min read

One of my favorite things about App Engine is that your requests are automatically traced, so you can see where during a request that your execution time is going. Shortly after, I showed a small technique that allowed you to insert custom trace blocks (since most trace data is RPC based).

Now, I’m really excited to talk about how tracing can be rolled out to all your other systems, even the ones not running on app engine.

Recap: Stackdriver Tracing

Stackdriver Trace allows you to analyze how customer requests propagate through your application, and is immensely useful for reducing latency and performing root cause analysis.

Trace continuously samples client requests, which it submits to the Stackdriver service to display their propagation / latency results.

One of the tricks though, is that it doesn’t fully capture your call graph (like a flame chart in Stackdriver Profiler would) but rather focuses on key paths around RPC calls, which generally result in most of your bottlenecks for the average application.

Everybody trace now!

As mentioned, if you’ve been an App Engine standard developer, chances are you’re familiar with this setup, since your requests have been getting traced automatically for some time now. But tracing (and the exploration of that data) is useful enough, which is why it was time to roll it out to other languages & compute offerings.

Customers tracing applications on VMs or containers (including Compute Engine, App Engine flexible environment, and Kubernetes Engine, as well as other non-Google services) can use a new set of instrumentation libraries to submit traces to the service, in a number of languages: Java, Python, NodeJs, Golang, .Net, Ruby, and PHP. We use OpenCensus (a project created by Google) to capture traces from most of these languages, and more languages will gain OpenCensus as time goes on.

What gets traced?

The client-side instrumentation automatically patches well-known modules to insert calls to functions that start, label, and end spans to measure latency of RPCs (such as mysql, redis, etc.) and incoming requests (such as express, hapi, etc.). As each RPC is typically performed on behalf of an incoming request, we must make sure that this association is accurately reflected in span data.

While this happens automatically with the libraries, you can also add custom spans to your code to catch the important bits that fall between.

More info

To get started with Stackdriver APM, simply link the appropriate instrumentation library for each tool to your app and start gathering telemetry for analysis. Stackdriver Debugger and GitHub Enterprise and GitLab, adding to our existing code mirroring functionality for GitHub, Bitbucket, Google Cloud Repositories, as well as locally-stored source code

Colt McAnlis

Written by

DA @ Google; http://goo.gl/bbPefY | http://goo.gl/xZ4fE7 | https://goo.gl/RGsQlF | http://goo.gl/4ZJkY1 | http://goo.gl/qR5WI1