[Linux] Profiling —visualize program bottleneck with Flamegraph
Let’s say you have a program which you want to improve the performance of. Probably the first thing one would try is perf
or dtrace
for sampling a running program. This is all good but its output is not in the most intuitive format. Today, let’s look at how we can create intuitive visualization of the stack trace using Flamegraph.
Prerequisite
First, I am going to assume we are running a Linux system. If you are running inside a Docker container, make sure your docker container has the required privilege. See this article for more details. We will need perf
utility, which you can install by the following command in Ubuntu
sudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r`
Second, we will discuss profiling a native app. That is, a program that is compiled to the native machine code for your system and not running on a virtual machine (VM) of some sort. In technical terms, these are programs that are compiled ahead-of-time (AOT). For example, C/C++/Rust/Go
programs use AOT compilers, while Java/C#/Python
programs are running on a VM, so they need dedicated tools for profiling.
Finally, the programs must have debug info in order to obtain helpful graph. For C/C++
programs, we can add -g
option. For Go
programs, the default option will do. For Rust
, we need to add debug = true
to the Cargo.toml
file.
Step by Step Guide
Alright. Let’s do an example with a Go
program. We will use a simple gunzip program written in Go. Let’s download the source code and build the program.
# clone example source code
git clone https://github.com/TechHara/go_gunzip.git
# go into the source directory
cd go_gunzip
# compile to an executable ./gunzip
go build
This will create gunzip
executable file in the directory. Now, let’s run the program with perf
tool from Linux. We will use linux.tar.gz
as an example file.
# download linux source code and compress as linux.tar.gz
wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.5.5.tar.xz -O - | xz -d | gzip > linux.tar.gz
# this is how you would run the program
./gunzip < linux.tar.gz > linux.tar
# this time, run while profiling
perf record -g ./gunzip < linux.tar.gz > linux.tar
# convert to trace output
perf script > trace.perf
Note that when we run the program, we must include ./
to indicate that we are running ./gunzip
executable in the current directory, rather than the system built-in gunzip
.
We should see trace.perf
file if all runs successful. If you want to look into the stack trace in text-format, you can do so with perf report
, but the better way is to use Flamegraph.
To visualize the stack trace, we need to download Flamegraph from its repo and run a few more commands.
# clone Flamegraph repo
git clone https://github.com/brendangregg/FlameGraph.git
# collapse the stack trace
FlameGraph/stackcollapse-perf.pl trace.perf > trace.folded
# convert to svg format
FlameGraph/flamegraph.pl trace.folded > trace.svg
# open up in firefox
firefox trace.svg
This should open up a pretty interactive webpage as below
Voila! The graph shows intuitive visualization of the program stack trace. The horizontal-axis is in unit of time
, whereas the vertical-axis shows stack frame
. For this particular example program, we can see that main.Decode
, runtime.memmove
, and syscall.write
functions take up the majority of the program runtime. If you want to improve the runtime, the probably you should start with main.Decode
function.
For different programs, whether from C/C++/Rust/Go
, the basic steps are the same. We just need to replace with the new program and provide its own arguments.
Even Better
OK, if you think this is too much work, there is an easier way. There is a Rust
package that does the job for us. Assuming you have cargo
installed on the system, you can run
# install flamegraph package
cargo install flamegraph
to install the package. Now, all you need is a single command to generate the flamegraph
# one liner with flamegraph crate
flamegraph --open --cmd "record -g" -- ./gunzip < linux.tar.gz > linux.tar
This single-line will profile, save trace, collapse, generate .svg
file, and open.