Introduction to Stackdriver Profiler

Yuri Grinshteyn
Google Cloud - Community
9 min readNov 12, 2018

[update 2/28/2019 — see this blog post on new Profiler features that I did not cover]

One of the (many) great things about working at a place like Google is being surrounded by people whose work humbles your own. One of those people is Daz Wilkin, who’s probably already forgotten more about Stackdriver than I have yet to learn. I was specifically inspired by his post on Stackdriver Profiler, which reminded me that this is a tool I’ve yet to explore in any meaningful fashion. I very much appreciated the instructions he provided on getting Profiler installed and running, and I wanted to use them myself — especially to compare what Profiler can do across languages.

The product team has, of course, clearly documented what functionality is available for each language here. As of the time of writing (November 2018), it looks like this:

I decided to try my hand at implementing something in each of the 3 supported languages to see first-hand what the different metrics captured look like and how difficult it would be to get Profiler to collect data. I settled on doing something pretty basic — creating a bit of code that would use a lot of CPU for a short randomly generated amount of time before returning, and I wanted to reproduce my approach as closely as possible across languages. Let’s dive in!

NODE.JS

Setup

I started in my comfort zone — with Node.js. First, here’s what I arrived at to achieve the functionality I wanted:

As you can see, this is a very basic Express app that simply generates a random integer and passes that integer to the blockCpuFor() function. That function itself does this:

It’s a basic loop that generates random numbers for the amount of time specified in the function call.

Finally, it’s time to add Profiler instrumentation to the code. In the app.js file, I added:

Pretty easy! If you want, you can see the entire file in the GitHub repository. Then, it was just a matter of running npm install, authenticating to Google Cloud using gcloud auth login, and running the app with node app.js. I generated a bit of load using Apache Bench and then logged into the GCP Console to see what Profiler was reporting (note that unlike Daz, who went through the trouble of packaging everything and deploying to GKE, I ran everything locally on MacOS).

Profile Data

Wall Time

As described in the documentation, Profiler for Node.js collects information on wall time and heap usage. In my case, because I was profiling something using a lot of CPU, but not very much memory, the former will likely be of more interest. Let’s take a look:

Sure enough — we see the full call stack, and nearly all of the time is being consumed within my blockCpuFor function. That’s actually really cool — this is exactly what I would have expected to see, and it’s clear how this could be useful when, for example, trying to figure out why a particular service is running slowly or what’s going on in an environment constrained by infrastructure resources.

Heap

As mentioned above, my code really isn’t doing much in terms of using memory — I’m not creating or manipulating large objects. Nevertheless, let’s take a look at what we get when we look at the Heap view in Profiler:

I was able to quickly and easily find the function in question, which can be useful if you’re a developer debugging a particular piece of code, and Profiler is even telling me exactly which file the function is in (just like in the Wall Time view). Handy!

GO

Setup

With Node.js well in hand, it was time to move on to trying the same thing in Go. Here, I’m a bit more out of my depth, but let’s see what happens.

First, let’s create the basic Web server:

Again, I have to give credit to Daz — this code is almost entirely lifted directly from his post. Now, let’s add the blockCPU function:

I don’t know if this is the most elegant implementation of the same idea as above, but it works for me. I’m again simply multiplying random numbers for the amount of time provided as seconds to the function call. It took me some time to figure out the sundry details of manipulating time in Go — adding a handful of seconds to the current time turned out to be way more esoteric than I expected — but I imagine that a more experienced developer would have had much less trouble with it.

Finally, let’s add the Profiler instrumentation to the main() function:

Again, I’m stealing the instrumentation code from Daz, but it’s pretty straightforward. I’m specifying projectID by hardcoding it because I couldn’t be troubled to make it a config or environment variable, and this would never do for a production deployment, but it serves for my purposes here. As before, you can see the entire file in the repo, if you like.

Before running the code, I needed to go get “cloud.google.com/go/profiler”, and then I was ready to run the app with go run main.go. I again generated some requests with Apache Bench — let’s see what Profiler does in Go.

Profile Data

CPU

As you can guess, this is where the most interesting data should be, given that the entire purpose of my “app” is to spin the CPU for a bit.

As expected, my function is clearly identified, and the source file is provided. I’m a little surprised that the number is as low as it is — apparently, only 57% of CPU utilization is coming from my blockCPU method. I’m not quite sure what to make of this conclusion — is it that Go is much more efficient generating random numbers? Or that everything else around it that is needed to actually run and serve a basic HTTP response is much less efficient? A more experienced co-worker pointed out that it could be caused by the fact that I’m creating a rand.New() inside the loop, which is likely very expensive. This is actually a great example of the value of profiling your code!

Heap

This view is probably going to be the least interesting in this use case.

Sure enough — I am struggling to even find the function I care about in this view. Thankfully, Profiler provides a handy way to find the code you’re looking for — simply use the Highlight feature next to the metric filter like this:

And there it is! It’s definitely not using very much heap, as one might expect.

Threads

This view might be more interesting if I was doing something multi-threaded, or, for example, seeing lots of load on my service, but I’m not. I’m not even parallelizing my load in Apache Bench. Nevertheless, let’s see what we get:

I was able to find my method, but it’s tiny in this view, and no surprise. Still, this is a very useful view — just not for my particular test case.

JAVA

Setup

I started this post by talking about being humbled — and this part was humbling indeed. I was harshly reminded that I hadn’t done any Java programming since my Operating Systems class in college. It was also a nice illustration of just how much easier it is to get started quickly in what I think of as the more modern languages like Node.js and Go. Nevertheless, let’s dive in.

Note that as of the time of writing (November 2018), Profiler for Java only works on Linux. As such, I needed to use a Debian Linux VM to do this part, rather than doing everything locally on MacOS. The next challenge was figuring out what to use as the basic Web framework in Java (since, again, my knowledge is more than a decade out of date). After a bit of reading, I settled on Spring Boot and confirmed this choice with a much more knowledgeable co-worker. Thankfully, there’s a great Getting Started over at spring.io, without which I would still be struggling to figure out how to get HTTPserver to work and how to get Maven to compile anything. I have to admit — even with following the tutorial, I still don’t quite have my mind wrapped around how Spring Boot works in the same way that I grok express in Node.js, but I was able to do enough to achieve my purpose.

I used the complete example that they provide and simply modified the source files. In truth, I did not need to modify the Application.java file at all other than to remove the code to iterate through the Beans provided. So, my very simple file now looks like this:

I think this is as minimal as this file can possibly get. My actual logic went into the HelloController.java. First, I added another @RequestMapping:

Pretty basic — I generate a random number and pass it to BlockCPU(). That function looks like this:

As with Go, I had quite the time figuring out how to manipulate time. I was eventually able to find the answer on Stack Overflow — I used System.currentTimeMillis() , but then ran into type mismatch issues and ended up using long instead of int. This is likely not the most elegant implementation out there, but it seems to work.

Once all the code was written and compiled, it was time to add Profiler instrumentation. This is another place where the Java experience is notably different. The process is clearly documented here, but the important part is that I needed to download the Profiler agent library and pass it to my Java runtime, rather than simply including it in my code. Specifically:

java \
-agentpath:/opt/cprof/profiler_java_agent.so=\
-cprof_service=java-profiler,\
-cprof_service_version=0.0.1,\
-logtostderr,\
-cprof_project_id=<project ID> \
-jar target/gs-spring-boot-0.1.0.jar

One thing to note here is that I actually ran into some issues with Profiler complaining about invalid arguments when I used “javaProfiler” as the service name and had to change to “java-profiler”. Basically, the profiler wants the service name to be all lowercase, dash-separated. It would be good if the agent would print a better diagnostic message about this, and I was told this is going to be fixed soon after raising this with engineering.

After conquering all of this, it was time to look at the data collected!

Profiler Data

Wall Time

My expectation here was that my function to block CPU would be front and center — after all, it’s where most of the time is being spent!

However, and this was probably the most surprising part of this experiment, it’s nowhere to be found at all! I even tried the Highlight function, and that did not change the view at all. This was and remains quite confusing, and I really don’t know how to explain this.

CPU Time

My expectation here is the same as before — most of the CPU time should be in my function. It’s certainly not easy to find it, as the call stack is really tall!

I was able to zoom in the view and find my function pretty easily, however:

As expected, it’s where most of the CPU time is spent. One thing that I find interesting, and this may simply be due to my relative ignorance of Java generally, is that the stack is very vertical when sorting by CPU time….. but much more complex when analyzing by Wall Time.

Conclusions

All in all, this was a very fun project, and I feel much more comfortable recommending Profiler than I did before. I am excited to add Python to this when that support is made available. I am also really interested in seeing how functionality within languages evolves over time, as well — for example, I hope that CPU usage/time will eventually make its way into Profiler for Go. Thank you for reading all the way to the end — and I hope you add Profiler to your instrumentation if you haven’t done so already.

--

--

Yuri Grinshteyn
Google Cloud - Community

CRE at Google Cloud. I write about observability in Google Cloud, especially as it relates to SRE practices.