Introducing Prism

A tool that helps us optimize our code

Today we are introducing Prism, an open source profiling tool for Go Code. At Geckoboard, we developed Prism to help us identify performance bottlenecks and measure the performance impact of our changes across a range of Git commits.

In this blog post, we’ll explain why we wrote our own profiler instead of using an existing tool such as pprof, then we’ll look at how to instrument a sample program with prism, and finally we’ll describe how it all works behind the scenes.

The backstory

Following best practices for micro-service based architectures, we run an instrumentation layer that collects a whole bunch of metrics from our production services, such as request latency, CPU and memory usage, Garbage Collection (GC) time and number of GC pauses. A few months ago, while reviewing the runtime metrics for one of our integrations, we noticed some latency and GC spikes that seemed to be caused by the service having to process an unusually large payload.

After diving deeper into the metrics, we identified the particular service endpoint that seemed to be responsible for the spike, and set out to identify the cause so we could develop a fix. However, the relatively complex execution flow of the misbehaving service made this harder than we had initially anticipated. In particular:

  • the upstream service implements a retry policy;
  • we use locks inside the service in order to prevent concurrent processing of the same payload by different service instances;
  • we have to consider a number of different code paths, depending on cache hits/misses, or early exits (for example if a lock could not be acquired).

Initially, we tried using pprof on an instance of our service running in a staging environment. That gave us a list of potential bottlenecks, but we still wanted to be able to split the collected metrics per goroutine, effectively profiling the call graph that followed the misbehaving function call.

At this point, we began experimenting by manually going through the call graph for each potential bottleneck, and adding timing code, then grouping the results by goroutine ID and dumping them to standard output. Even though this approach produced skewed results because we failed to account for the overhead introduced by our profile hooks, it did give us a much better insight into the runtime behavior of our code. Armed with this information, we were able to optimize our code and gain a 2x performance boost, in addition to achieving a drop in the number of GC pauses. At that time we were still using Go 1.4, whose GC implementation was not as optimized as the newer Go versions; needless to say, the GC-pause drop made us very happy!

This little trick seemed useful, but our approach had some downsides, too. For a start, it was a manual and tedious process. In addition, sprinkling all that extra profiling code around made our Git branch dirty; after optimizing our code and verifying the increase in performance, we had to go back and remove all the profile hooks before committing our changes.

At Geckoboard, we’re always looking for opportunities to be lazy and write tools to automate manual tasks. And that’s how the idea for prism was born! A tool that would scan a Go package, automatically figure out the call graph for the functions we wish to profile, inject profiling hooks at the appropriate places, then run the modified code and collect timing metrics, while also tracking and accounting for the additional overhead from the profiler.

Fast forward a few days, and it’s Innovation Wednesday at Geckoboard! Every two weeks we get to put on our mad scientist hats and work on our own projects, so we started working on a proof-of-concept for prism. By the end of the day we had an initial, hacky version up and running. We organized a brown bag session to share the work with the rest of the Engineering team and, after receiving positive feedback, spent some more time polishing the code and releasing prism as open source.

Getting to know prism through a simple example

To see how prism can be used to profile and optimize a program let’s start with the following simple example:

The purpose of this program is to process a list of 1M rows and return back the row index as a string or the word"redacted" if the row index contains only odd digits. We will use prism to identify the problematic areas, optimize our code and compare our improved code's performance against the original (baseline) code.

For the rest of the article we assume that the above example file resides in $GOPATH/src/github.com/geckoboard/test.

Targets

To invoke prism we need to specify one or more profile targets, that is the fully qualified names of the functions that we want to profile. Profile target names should be unambiguous; prism must be able to tell the difference between function Foo exported by package a and a similarly named function exported by package b. We use the following two rules to construct the target names:

  • if the function has no receiver (e.g. genKey) the target is constructed as qualified_pkg_name/fn_name. In the case of genKey the target would begithub.com/geckoboard/test/genKey
  • if the function has a receiver (e.g. processRow) the target is constructed asqualified_pkg_name/receiver_type_name.fn_name. In the case of genKey the target would be github.com/geckoboard/test/processor.processRow

Running prism

For this example we will be attaching the profiler to main. Since main does not have a receiver, we apply the first rule from above to define our target: github.com/geckoboard/test/main.

prism profile --profile-target github.com/geckoboard/test/main \
--profile-label="Initial code" \
$GOPATH/src/github.com/geckoboard/test

Running the above command will output something similar to the image below. Prism will output the name of the temporary folder it used as well as the number of modified lines and patched functions.

By default, prism will run the patched project using a shell command that selects all non-test .go files and passes them to go run. If you need to override this behavior you can use the --run-cmd option to specify your own run command (e.g. make run). Moreover, if your package relies on some special setup steps prior to running (like applying migrations etc.) you can also specify an optional build command to be executed before the run command using the --build-cmd option.

While prism is running your package, it will capture and display any stdout/stderr output from the running process (colorized; if your terminal supports it)

Capturing a baseline profile from the initial code

Unless overridden by the user, prism will store captured profiles by default in $HOME/prism. Let’s print the captured profile and get a breakdown of the time spent in each function:

prism print ~/prism/profile-*
Printing the profile captured from the initial code

Looking at the above output you will probably notice some values that are 0.00ms. Prism will display time metrics using ms as the time unit. However, you can use the --display-unit option to alternatively select microseconds (us), nanoseconds (ns) or specifyauto to let prism figure the best display unit based on the displayed metrics.

When grepping through a long stack trace you may want to mask values below a certain threshold so you can reduce the display noise and focus on the metrics that matter. To this end, prism supports the --display-threshold option. Here is an example where we print the same profile but this time mask values less than 0.1ms.

prism print --display-threshold 0.1 ~/prism/profile-*
Applying a threshold to displayed values

We can also instruct prism to show us the amount of time spent in each function as a percentage of the total execution time by specifying the — display-format=percentoption. In addition, we can change the order and type of metrics displayed using the --display-columns option. This option accepts a comma-delimited list of metrics to display.

prism print --display-format=percent \
--display-columns=total \
~/prism/profile-*
Displaying the time breakdown as a percentage of total execution time

In this example we only showed the total time (as a percentage). The following metrics are supported by prism:

  • total
  • invocations
  • min
  • max
  • mean
  • median
  • p50
  • p70
  • p90
  • p99
  • stddev

As expected, the profiler output indicates that we spend the majority of time inside processRow, so that's the appropriate place to begin applying optimizations. Going through the function we can immediately observe the bottleneck: we compile the regex on every invocation of processRow. So, let's move that regex into a global variable:

We run prism again using a different label (--profile-label="Regex optimization") to collect a fresh profile.

At this point we can use the diff command to compare the two captured profiles. For brevity we will use the --display-columns option again to limit the diff to just the total time metric.

Each captured profile file name includes the target’s name, a creation timestamp and the ID of the goroutine inside which the profile was captured. This naming convention makes it really easy to pass the captured profiles to the diff command in the correct order using a simple shell expansion operation.

prism diff --display-columns=total ~/prism/profile-*
Comparing the optimized code with the original code

Great news! Our code runs much faster now. But can we do any better? Let’s take a look inside the genKey function. It just calls fmt.Sprintf to format the key as a string. The code is pretty straightforward. One thing to notice though is that it gets called 1M times.

When the Go runtime invokes a function, it incurs a small amount of overhead as it needs to do some book-keeping (set up the stack, push registers on entry and pop them when the function returns). While this overhead is fairly small, it tends to add up in situations like this when a function is invoked a significant number of times. A potential optimization would be to force-inline this function by moving it into processRow. While we are at it we can also change the formatting code to use strconv.FormatInt; fmt.Sprintf will eventually delegate the formatting to strconv.FormatIntso we may just as well call it directly!

We run prism one more time using --profile-label="Inline genKey". Let’s compare the last profile against the last optimization step and see if we managed to improve the total time metric.

ls ~/prism/profile-* | 
tail -2 |
xargs prism diff --display-columns=total
Comparing the last optimization step against the previous step

Our final implementation is ~50% faster compared to the previous optimization step. Now, let’s compare all three profiles:

prism diff --display-columns=total ~/prism/profile-*
Comparing each optimization step against the initial implementation

Success! According to our measurements our last optimization round made our example approximately 46 times faster than our initial solution.

Behind the scenes: how it works

Building the call graph

In order for prism to be able to apply its hooks, it needs to parse the project’s source code and generate a call graph for each specified profile target.

First, we generate a list of all non-test .go files in the target package folder and use it to construct a configuration that can be used by the Go loader to load and build the package.

With the help of the ssautil package we obtain a list of all functions potentially needed by the program. This list includes not only the functions in the target package but also any potentially reachable functions from the packages it imports down to the actual functions implementing the Go runtime.

We iterate the reachable functions list looking for any entries matching the targets specified as arguments to the prism profile command. Using each match as the root, we construct the call graph of reachable functions using the Rapid Type Analysis (RTA) implementation provided by the rta package.

An interesting thing to note about RTA is the way it handles interfaces while constructing reachability graphs. Whenever it detects a method invocation through an interface it will mark that method as reachable in all types that satisfy this interface and include them into the call graph. This means that prism will be able to hook the correct functions even if you are using a lazy initialization pattern like the code below:

Navigating the AST and injecting the profiler

Once we have constructed the reachability call graphs for all profile targets, we create a temporary folder to act as a Go workspace containing a copy of the project and proceed to inject our profiler hooks.

To achieve this, we parse each project’s file into an abstract syntax tree (AST) and use a Visitor to scan the tree looking for function declaration blocks whose function name matches an call graph entry.

If we find a matching function declaration block, we modify its body and prepend our profiler hooks to the list of statements comprising it:

  • a hook to indicate we entered the function
  • a hook to indicate we exited the function. This hook is wrapped in a defer block to ensure that it always gets invoked whenever the function returns

When scanning the AST, prism will automatically ignore any functions that do not belong to the target package or its sub-packages. The reason for this is that prism is only allowed to modify files residing in the temporary project copy. If you are vendoring your dependencies using godep and want to instruct prism to also inject profile hooks into them, you can use the --profile-vendored-pkg option to specify one or more regexes for matching vendored packages that should be hooked.

Once we are done processing each file, we write it back to disk if its AST was modified by our visitor. In addition, we inject additional code to the package’smain function whose purpose is to initialize the profiler and to ensure that the profiler flushes all captured profiles before the app exits.

Running the patched package

Once the target project has been patched, it’s ready to be executed by prism. To make sure that the patched package does not interfere with the original version of the package while still being able to find the packages it imports, we override GOPATH and prepend the path to the temporary Go workspace we created in the previous step.

Prism then executes the patched project as an interactive process waiting for it to finish executing. At the same time, prism will forward HUP, INT, TERM and QUIT signals to the running process, allowing you to terminate its execution by pressing CTRL+C or just by signalling the prism process.

Bonus stuff: prism git-fu

As we mentioned in the beginning of the post, one of our goals was to be able to compare the performance differences between a set of Git commits. The following script accepts as arguments two SHA hashes and a list of prism profile targets.

It checks out all commits between the two hashes and runs prism profile for each commit using its SHA as the profile label, storing the captured profiles in a temporary directory. Once all profiles have been collected it will run a prism diff command to print a table comparing each profile against the one obtained for the first commit.

Conclusion

In this post we have introduced prism, an open source tool for automatically injecting profiler hooks to Go apps and collecting/comparing profile data.

If you try prism and find it useful let us know! We are also open to contributions; please see our contribution guide for ideas and ways to contribute.