Custom tracing in Profiling GAE using the Stackdriver API

Colt McAnlis
Google Cloud - Community
5 min readJun 21, 2017

--

If you remember a few weeks ago, I helped out a group of developers by adding some custom tracing information to StackDriver for Google App Engine. Since then, I’ve received a number of requests asking for more information on how I did it, so, let’s take a look.

Understanding tracing

Tracing (also called “inline profiling”) is an important tool in any performance utility belt . The tool lets you get exact timings between begin/end API calls in your code, and is extremely helpful in tracking down where your time goes, when dealing with complex dependencies. One downside to tracing, is that it requires you to modify your existing source code to track the amount of time a given segment takes. This generally involves littering your code with scope-based macros that handle timing, buffering, and logging of that particular data to some output.

This is why Stackdriver’s integration into App Engine Standard is so awesome; It’ll handle tracing for all RPC commands on your behalf, no code modification needed.

All traces are listed in the cloud console, and you can view, browse, all the captured traces. Eventually, selecting one will show you a graphical representation of where the RPC calls and time when for that request:

The Stackdriver Trace API

While RPC tracing is grand there’s sometimes local code that needs to be profiled as well. Since it would be a huge performance issue to trace all your function calls on your behalf, Stackdriver has provided a patching API that allows you to augment your trace with custom data. This way, you can specify exactly the data you want included in the trace, w/o all the noise.

You can check out the documentation, but the gist is that you hit a REST endpoint, and provide it a JSON string which includes your timing data and the trace ID you’d like to patch. The only complex part of interacting with the api lies in A) Auth, and B) The format of the JSON string.

Thankfully for B, the documentation includes a small widget that you can playing around with to figure out exactly what’s expected. If you use the values below, you’ll get a single span that properly executes the widget.

Setting up the local environment

Once you’re ready to start testing the code, you’ll need to first set up your local environment. I’m working in python, and as such, need to install the Google API python client library.

First, navigate to your project folder, and upgrade the library:

Next, you need to install the library locally to your project; Call the following to create a local copy of the library inside of a “./lib” folder. When you deploy your project, this folder will be copied up as well.

FWIW there’s tons more details on the “getting started page” for the API client (if you need to dig in more).

Creating a profiler class

If you’ve been around as long as I have, chances are you’ve written a timer class multiple times. The concept is very straightforward: Given a timer object, calling start/stop should track how much time elapsed between them. Our inline profiler is really no different, except that we add a Unique ID for the timer class (I call it a “span” for consistency), and some scoped callbacks to make it easier to use.

In order to collect traces for our request, we need some sort of management object which has persistence long enough to store our spans and return their data when requested. This can take many shapes and forms, and my simple version is intentionally bare bones, where the purpose is to call the span function each time you want to start a scoping profile.

The usage is pretty straightforward:

Submitting results to the API

With the above classes, we have the ability to time blocks of code, so the next big issue is how to properly submit them to the StackDriver API. To do this, we first need to be able to fetch the ID of our project, as well as the ID of the request object (so we can patch it’s trace).

The project ID can be easily fetched through the appengine API for it (note, this requires adding the 3rd party library for the identity library. Check out the google.appengine.ext.vendor for more).

https://gist.github.com/mainroach/5d75d882e81e1ba8ad0cd7cbb93afb0b

The request ID is also easy to acquire, as it’s generated by the Google Front Ends, and is usually annotated in the request header as “X-Cloud-Trace-Context”. We can easily fetch that information (if it exists):

Since our span & trace objects allow generation to JSON, we can create the JSON blob that the stack driver API expects with some simple data building:

Now that we’ve got all that data ready, we can finally connect to the stackdriver API. This first requires us to get some authentication in order, and then finally submit the body data we’re interested in. To get information on how to do this, I had to dig pretty deep into the API libraries, but here’s the shortest needed version:

All together now!

I modified the TraceContext object to accept the cloud tracing information from the header package, and bundled it all together:

Opening up the trace for a request, shows us exactly what we wanted:

Caveats

I must highlight that this code works, right now, as a snapshot. I expect that the format of some of these things might change, or that some of the APIs will mature over time into an official API.

Also, do note that you will see an increase in response time, as submitting your tracing results does add some overhead to push the data to the trace API:

As such, make sure that you’re very careful with how you deploy and use this tracing data. You always have to be careful that your tracing information isn’t hurting performance too bad!

--

--