Golang’s built-in profiling tool use-cases

Published in

Jamf Engineering

4 min readJun 14, 2022

The more complex your application code grows, the more complicated it is to find the root cause of various runtime issues that usually occur only once the app is deployed in production and under considerable load. These could be an application being killed due to an Out Of Memory error, performance degradation or deadlocks in your code.

This article is not aiming to teach you how you should use the profiling tools as there are many in-depth materials that will help you with this task (Google search for “pprof tool” will reveal some of those). Instead, we would like to focus on three main use cases of Golang’s built-in profiling tool (I will refer to it as “pprof” throughout this article) that has proven to be helpful to us in various situations.

The one that you probably already know about

Let’s start with the basics!

For those who already read the pprof documentation page, this probably won’t be anything new or surprising. Golang’s pprof package contains tools that can help you get memory, cpu, blocks and other profiles just by adding it as an import to your file and exposing its’ endpoint. You’ve probably done this many times yourself, so we will just show a typical command that will open the Go CPU profile collected over 30 seconds in built-in profile explorer:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

We usually do this on-demand, either during local development, on DEV environment or with help of our Cloud Engineers in production in case some metrics (CPU, memory consumption) shows unexpected values or a potential memory leak.

The one that your Cloud Engineers implemented after last app crash

Nothing too interesting so far you say? Let’s then have a look at a second way how to obtain valuable profiling data from your application. With the previous example, you wouldn’t be able to collect profiling information automatically in case a SIGTERM signal was sent to the container from Kubernetes.

This is exactly what we needed to investigate a potential deadlock in one of our Go routines. Once we weren’t able to obtain a lock on a resource we returned a 500 response code from the liveness probe to signal to the Kubernetes that the app is not in an “up” state. This triggered a restart of affected container and a PreStop hook containing all the necessary commands to collect the profiling data.

Snippet of code from the deployment.yaml file:

...
containers: ...
  lifecycle:
    preStop: 
      exec:              
        command:              
        - sh              
        - -c              
        - >                
          now=$(date -u +"%FT%T") &&                
          curl -o "/pprof/$HOSTNAME-$now-goroutine2" http://localhost:8079/debug/pprof/goroutine?debug=2 &&                
          curl -o "/pprof/$HOSTNAME-$now-goroutine1" http://localhost:8079/debug/pprof/goroutine?debug=1 &&                
          curl -o "/pprof/$HOSTNAME-$now-trace" http://localhost:8079/debug/pprof/trace?seconds=1...
volumes:      
  - name: pprof        
    emptyDir: {}

Note: The pprof endpoint has been previously exposed on port 8079. For the information to be available for later investigation, you also need to mount a dedicated volume for the pprof files. For our purposes the volume of type “emptyDir” was sufficient, as it was enough for us to keep the collected data for the lifetime of a pod.

With this simple step you now have a clear view on the state of the application just before it gets killed by Kubernetes, which may shorten the time to debug potential deadlock significantly.

The one that sleeps in your tech-work backlog

Still not impressed? Then wait till you see our last contestant: the continuous profiling tool named “Pyroscope”. What does Pyroscope do? It collects the profiling information produced by the pprof tool and creates a continuous profile of the application that can be then viewed using a web UI. It can do this by scraping the pprofendpoints exposed to it, or collect data via its agent running directly in the application.

Note: To be fair, it’s not only built to scrape Golang profiling data. It supports multiple profile data sources from apps written in different languages using an agent that pushes the data to Pyroscope server. However with Golang’s ability to expose the pprof endpoints, you can configure it to use the aforementioned “pull model” and start scraping the profiling data without the need to attach any agent to your app.

Pyroscope UI (source: https://github.com/pyroscope-io/pyroscope)

Pyroscope gives you a strong tool to investigate various profiles of your application, compare it with historical data and get to the root cause of an issue really quickly. Of course you have to consider things like the retention period and space needed for the profiles when designing your profiling solution.

In our situation, we decided not to use Pyroscope after an initial proof-of-concept. It seemed to be a promising tool but with our application running in multiple data centres around the globe, it just seemed like an overkill to scrape every application for profiles continuously. We also tested the Pyroscope Grafana plugin briefly, mainly because it gives you one central point where profiles from various data centres can come together (so that you don’t need to look at Pyroscope Web UI in different DCs). It doesn’t provide as many features as the Pyroscope UI, but would probable be sufficient for a quick look into the profiling data.

Conclusion

We hope some of these approaches might help you when investigating some issues in your Go application. They are definitely not a silver bullet solution to all problems, but with the right tool selected for the problem you might be able to get to the root cause quickly and without necessity to rely on complex third party solutions. Happy profiling!

Golang’s built-in profiling tool use-cases

The one that you probably already know about

The one that your Cloud Engineers implemented after last app crash

The one that sleeps in your tech-work backlog

Conclusion

Written by Peter Siman