Erlang/OTP profiling: eprof

"Time and space are not conditions of existence, time and space is a model for thinking." — Albert Einstein.

Published in

Erlang Battleground

11 min readJan 26, 2022

Hi folks! Ukrainian Erlanger is here 🤘! We continue our series of articles about Erlang/OTP profiling. In this article, I want to talk about eprof: an old but still pretty useful tool for analyzing the time spent by our Erlang programs. If you still didn't read the previous topic about Erlang/OTP profiling: fprof, it makes sense to take a small step back and read it because all the information in this series of articles can make your Erlang life easier.

So, let's rock 🤘!

Did you hear about Advent of Code by Eric Wastl? If not, well, this is the elevator pitch:

© Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. People use them as a speed contest, interview prep, company training, university coursework, practice problems, or to challenge each other.

You can find more info on their about page. Those challenges are held every year. The race starts on December 1st, and the deadline is December 25th. Of course, you can try to work on them at any time, but then you will be missing out on the race. This is also an excellent game for your team. If you still didn't play it, you should at least try. This is a perfect opportunity to test and improve your skills and your team's skills. But be careful: the game is addictive 😋.

But why do I mention Advent of Code here? Because it provided a great example that we'll use in this article. We will try to solve one of those puzzles and see how eprof will help us improve our functions. If you haven't tried these puzzles yet, it's worth mentioning that each task has two parts. The first is the main essence, and the second part implies an extension of the functionality from the first part. Usually, you would end up with two different approaches: one for each task. Because of this, you don't think about the speed of your code in the first part. Well, this time… I did think about it 🙀.

Task

Day 6: Massive school of glowing lanternfish

In short, you need to count the number of fish that can be born in N days. The birthday timer model of lanternfish is simple©. So, suppose you have a lanternfish with an internal timer value of 3:

After one day, its internal timer would become 2.
After another day, its internal timer would become 1.
After another day, its internal timer would become 0.
After another day, its internal timer would reset to 6, and it would create a new lanternfish with an internal timer of 8.
After another day, the first lanternfish would have an internal timer of 5, and the second lanternfish would have an internal timer of 7.

Example with five lanternfish which have different timers:

Initial state: 3,4,3,1,2
After  1 day:  2,3,2,0,1
After  2 days: 1,2,1,6,0,8
After  3 days: 0,1,0,5,6,7,8
After  4 days: 6,0,6,4,5,6,7,8,8
After  5 days: 5,6,5,3,4,5,6,7,7,8
After  6 days: 4,5,4,2,3,4,5,6,6,7
After  7 days: 3,4,3,1,2,3,4,5,5,6
After  8 days: 2,3,2,0,1,2,3,4,4,5
After  9 days: 1,2,1,6,0,1,2,3,3,4,8
After 10 days: 0,1,0,5,6,0,1,2,2,3,7,8
After 11 days: 6,0,6,4,5,6,0,1,1,2,6,7,8,8,8
After 12 days: 5,6,5,3,4,5,6,0,0,1,5,6,7,7,7,8,8
After 13 days: 4,5,4,2,3,4,5,6,6,0,4,5,6,6,6,7,7,8,8
After 14 days: 3,4,3,1,2,3,4,5,5,6,3,4,5,5,5,6,6,7,7,8
After 15 days: 2,3,2,0,1,2,3,4,4,5,2,3,4,4,4,5,5,6,6,7
After 16 days: 1,2,1,6,0,1,2,3,3,4,1,2,3,3,3,4,4,5,5,6,8
After 17 days: 0,1,0,5,6,0,1,2,2,3,0,1,2,2,2,3,3,4,4,5,7,8
After 18 days: 6,0,6,4,5,6,0,1,1,2,6,0,1,1,1,2,2,3,3,4,6,7,8,8,8,8

Each day, if a 0 becomes a 6, it adds a new 8 to the end of the list, while the others decrease by 1.

In this example, after 18 days, there are a total of 26 fish. After 80 days, there would be a total of 5934. Question: How many lanternfish would there be after 80 days with the following initial state?

5,4,3,5,1,1,2,1,2,1,3,2,3,4,5,1,2,4,3,2,5,1,4,2,1,1,2,5,4,4,4,1,5,4,5,2,1,2,5,5,4,1,3,1,4,2,4,2,5,1,3,5,3,2,3,1,1,4,5,2,4,3,1,5,5,1,3,1,3,2,2,4,1,3,4,3,3,4,1,3,4,3,4,5,2,1,1,1,4,5,5,1,1,3,2,4,1,2,2,2,4,1,2,5,5,1,4,5,2,4,2,1,5,4,1,3,4,1,2,3,1,5,1,3,4,5,4,1,4,3,3,3,5,5,1,1,5,1,5,5,1,5,2,1,5,1,2,3,5,5,1,3,3,1,5,3,4,3,4,3,2,5,2,1,2,5,1,1,1,1,5,1,1,4,3,3,5,1,1,1,4,4,1,3,3,5,5,4,3,2,1,2,2,3,4,1,5,4,3,1,1,5,1,4,2,3,2,2,3,4,1,3,4,1,4,3,4,3,1,3,3,1,1,4,1,1,1,4,5,3,1,1,2,5,2,5,1,5,3,3,1,3,5,5,1,5,4,3,1,5,1,1,5,5,1,1,2,5,5,5,1,1,3,2,2,3,4,5,5,2,5,4,2,1,5,1,4,4,5,4,4,1,2,1,1,2,3,5,5,1,3,1,4,2,3,3,1,4,1,1

Solution

So what data do we have? Well, firstly, we know the initial number of fish, we also know the timer(initial state) for each of them, and we also know the number of days. All this information is more than enough to write our solution. We will create three functions: the main one inside which the simulator function will be called. Inside it, we will calculate the timer and, depending on what state the fish has, we will decide what to do: either reduce the timer or create a new fish. We will return the result of the work of the last two functions to the main one, and, as its final step, we will count the number of fish created. To start, let's take the initial state from the example, and let's try to write out the logic for our results to match:

-module(day6).-export([solution/2]).solution(Days, Input) ->
    length(simulation(Days, Input)).simulation(0, Fishs) ->
    Fishs;
simulation(Days, Fishs) ->
    simulation(Days - 1, lanternfish(Fishs)).lanternfish([]) ->
    [];
lanternfish([0|T]) ->
    [6 | [8 | lanternfish(T)]];
lanternfish([H|T]) ->
    [H - 1 | lanternfish(T)].

And let's try to see if this works:

1> day6:solution(18, [3,4,3,1,2]).
26
2> day6:solution(80, [3,4,3,1,2]).
5934

Well, it looks like it works. Yes, this is not a very pretty implementation, and, as a minimum, we can optimize the current function to skip the call to length/1. But we can see that this works as well without any additional changes. Now, let's try to run the code with actual input data:

1> In = [5,4,3,5,1,1,2,1,2,1,3,2,3,4,5,1,2,4,3,2,5,1,4,2,1,1,2,5,
         4,4,4,1,5,4,5,2,1,2,5,5,4,1,3,1,4,2,4,2,5,1,3,5,3,2,3,1,
         1,4,5,2,4,3,1,5,5,1,3,1,3,2,2,4,1,3,4,3,3,4,1,3,4,3,4,5,
         2,1,1,1,4,5,5,1,1,3,2,4,1,2,2,2,4,1,2,5,5,1,4,5,2,4,2,1,
         5,4,1,3,4,1,2,3,1,5,1,3,4,5,4,1,4,3,3,3,5,5,1,1,5,1,5,5,
         1,5,2,1,5,1,2,3,5,5,1,3,3,1,5,3,4,3,4,3,2,5,2,1,2,5,1,1,
         1,1,5,1,1,4,3,3,5,1,1,1,4,4,1,3,3,5,5,4,3,2,1,2,2,3,4,1,
         5,4,3,1,1,5,1,4,2,3,2,2,3,4,1,3,4,1,4,3,4,3,1,3,3,1,1,4,
         1,1,1,4,5,3,1,1,2,5,2,5,1,5,3,3,1,3,5,5,1,5,4,3,1,5,1,1,
         5,5,1,1,2,5,5,5,1,1,3,2,2,3,4,5,5,2,5,4,2,1,5,1,4,4,5,4,
         4,1,2,1,1,2,3,5,5,1,3,1,4,2,3,3,1,4,1,1].
2> day6:solution(80, In).         
350917

That answer is correct. After our first victory, we feel confident and ready to move on to another part of the puzzle. So, when we open the second part to it, we see the following condition©:

How many lanternfish would there be after 256 days with real input data?

Ok, let's try to compute that…

3> day6:solution(256, In).
killed

What!? 😱

Let's figure out what happened.

Profiling

At this stage, eprof enters the scene.

Our function works but is far from ideal and has several problems that need to be fixed. First of all, we need to figure out our problems and where they stem from. Once we have analyzed all the data, we can optimize the given function properly. And so let's see how the profiler eprof can help us with this task:

1> In = [5,4,3,5,1,1,2,1,2,1,3,2,3,4,5,1,2,4,3,2,5,1,4,2,1,1,2,5,
         4,4,4,1,5,4,5,2,1,2,5,5,4,1,3,1,4,2,4,2,5,1,3,5,3,2,3,1,
         1,4,5,2,4,3,1,5,5,1,3,1,3,2,2,4,1,3,4,3,3,4,1,3,4,3,4,5,
         2,1,1,1,4,5,5,1,1,3,2,4,1,2,2,2,4,1,2,5,5,1,4,5,2,4,2,1,
         5,4,1,3,4,1,2,3,1,5,1,3,4,5,4,1,4,3,3,3,5,5,1,1,5,1,5,5,
         1,5,2,1,5,1,2,3,5,5,1,3,3,1,5,3,4,3,4,3,2,5,2,1,2,5,1,1,
         1,1,5,1,1,4,3,3,5,1,1,1,4,4,1,3,3,5,5,4,3,2,1,2,2,3,4,1,
         5,4,3,1,1,5,1,4,2,3,2,2,3,4,1,3,4,1,4,3,4,3,1,3,3,1,1,4,
         1,1,1,4,5,3,1,1,2,5,2,5,1,5,3,3,1,3,5,5,1,5,4,3,1,5,1,1,
         5,5,1,1,2,5,5,5,1,1,3,2,2,3,4,5,5,2,5,4,2,1,5,1,4,4,5,4,
         4,1,2,1,1,2,3,5,5,1,3,1,4,2,3,3,1,4,1,1].
2> {ok, Pid} = eprof:start().
3> ok = eprof:log("log.analyze").
4> {ok, Result} = eprof:profile([Pid], day6, solution, [80, In]).
5> eprof:analyze().
6> Result.
350917

And this is the output:

****** Process <0.191.0>    -- 0.20 % of profiled time *** 
FUNCTION                         CALLS        %  TIME  [uS / CALLS]
--------                         -----  -------  ----  [----------]
lists:keyfind/3                      2     0.00     0  [      0.00]
gb_trees:empty/0                     1     0.00     0  [      0.00]
erlang:timestamp/0                   1     0.00     0  [      0.00]
gen_server:decode_msg/9              1     0.00     0  [      0.00]
gen_server:try_dispatch/3            1     0.00     0  [      0.00]
gen_server:handle_msg/6              1     0.00     0  [      0.00]
erlang:ensure_tracer_module_loaded/2 2     0.06     1  [      0.50]
gen_server:try_dispatch/4            1     0.11     2  [      2.00]
erlang:trace_pattern/3               2     0.17     3  [      1.50]
gen_server:loop/7                    1     0.34     6  [      6.00]
erlang:system_flag/2                 2    13.88   245  [    122.50]
erts_internal:trace_pattern/3        2    85.44  1508  [    754.00]
-------------------------------  -----  -------  ----  [----------]
Total:                              17  100.00%  1765  [    103.82]****** Process <0.197.0>    -- 99.80 % of profiled time *** 
FUNCTION              CALLS        %    TIME  [uS / CALLS]
--------              -----  -------    ----  [----------]
day6:simulation/2        81     0.01      47  [      0.58]
day6:solution/2           1     0.07     616  [    616.00]
day6:lanternfish/1  3827531    99.92  877330  [      0.23]
------------------  -------  -------  ------  [----------]
Total:              3827613  100.00%  877993  [      0.23]
ok

Wait, wait, wait! What are we doing there? 🙃

Let's follow the process step by step…

1> In = [5,4,3,5,1,....

Preparing our input data. No more, no less.

2> {ok, Pid} = eprof:start().

Just starting eprof server. It's essential to keep track of the Pid because sometimes it is needed later, for instance, to profile this process to understand how much time was spent on profiling. Of course, this information is not always required. But this is an excellent example to find additional info about the profiling server. It is also worth noting that it is not always necessary to call eprof:start/1 because if you use eprof:profile/* the eprof server will start automatically. But then you will need use whereis(eprof) to get the process identifier if you need it. So, up to you 😉.

3> ok = eprof:log("log.analyze").

The eprof:log/1 call is needed to specify the file's path to store analysis data. At the same time, you will also see the output on the console. So, if you skip this step, you will only see the result in the console.

4> {ok, Result} = eprof:profile([Pid], day6, solution, [80, In]).

The central part: this starts the actual profiling. In this step, we use eprof:profile/4, where the first argument is a list of process identifiers. We set it to a list with a single element: the process identifier of the eprof server. This list can be empty if you don't want to trace any other processes.
The second argument is a module name where the function to be profiled is placed. The third argument is the function that we want to profile. The fourth argument is the list of arguments passed to that function.

5> eprof:analyze().

Once the profiling is done, this API is called to format the traces of each process. There are only two available analysis types: procs and total. The default one is procs. That's why in the output, you see information broken down by processes. If you only want the grand totals, you can use eprof:analize(total). For our current example, the output looks would look like…

1> eprof:analyze(total).
FUNCTION                         CALLS        %    TIME  [uS /CALLS]
--------                       -------  -------    ----  [---------]
lists:keyfind/3                      2     0.00       0  [     0.00]
gb_trees:empty/0                     1     0.00       0  [     0.00]
erlang:timestamp/0                   1     0.00       0  [     0.00]
gen_server:decode_msg/9              1     0.00       0  [     0.00]
gen_server:try_dispatch/3            1     0.00       0  [     0.00]
gen_server:handle_msg/6              1     0.00       0  [     0.00]
erlang:ensure_tracer_module_loaded/2 2     0.00       1  [     0.50]
gen_server:try_dispatch/4            1     0.00       2  [     2.00]
erlang:trace_pattern/3               2     0.00       3  [     1.50]
gen_server:loop/7                    1     0.00       6  [     6.00]
day6:simulation/2                   81     0.01      47  [     0.58]
erlang:system_flag/2                 2     0.03     245  [   122.50]
day6:solution/2                      1     0.07     616  [   616.00]
erts_internal:trace_pattern/3        2     0.17    1508  [   754.00]
day6:lanternfish/1             3827531    99.72  877330  [     0.23]
-----------------------------  -------  -------  ------  [---------]
Total:                         3827630  100.00%  879758  [     0.23]

This formatted data was also stored at the file created by eprof:log/1 with the same format as the one you see here.

Improvements

We can see profiled time of the eprof server, which is only 0.20%. The other 99.80% profiled time corresponds to the solution that we implemented. With this information, we can easily find a function that needs improvements. In our case, this is day6:lanternfish/1. So, we see that this function is called 3827531 times to generate 350917 lantern fishes for 80 days based on the input data received in the task. This can lead us to think that we should minimize the number of function calls.

As you have already noticed, our main problem is that the number of function calls depends on the number of fish, and we cannot predict how much fish will be waiting for us in 256 days. At the same time, we can see in our current task that on a particular day, all that happens is that certain fish move to a specific timer depending on their age, with the additional possibility of an increase in the fish counter. But we know precisely when our simulation will create new fish and what counter it will have. This realization suggests that we should focus not on the number of fishes but on their movement on a given timer.

Let's improve our solution by adding some states from 9 life timer points. Then we will parse the input with a counter and the number of already existing fish and sort them by age. With that, we generate the initial state. According to the fish lifecycle timer logic description, we understand that we need to directly juggle fish depending on the timer and add a new fish depending on its new day and timer position. In other words, each day, we'll have a state like:

Day0, Day1, Day2, Day3, Day4, Day5, Day6, Day7, Day8

And we'll juggle this data as described in the task requirement, so the next day it'll look like…

Day1, Day2, Day3, Day4, Day5, Day6, Day0 + Day7, Day8, Day0

We apply this approach day by day. At the end of the last day, we sum up the number of our fish sorted by timer days. Our solution should now look like this:

-module(day6).-export([solution/2]).solution(Days, Input) ->
    State = init(Input),
    simulation(Days, State).init(Input) ->
    State = #{0 => 0,
              1 => 0,
              2 => 0,
              3 => 0,
              4 => 0,
              5 => 0,
              6 => 0,
              7 => 0,
              8 => 0},
    init(Input, State).init([], Acc) ->
    Acc;
init([H|T], Acc) ->
    init(T, Acc#{H => maps:get(H, Acc) + 1}).simulation(0, #{0 := D0,
                1 := D1,
                2 := D2,
                3 := D3,
                4 := D4,
                5 := D5,
                6 := D6,
                7 := D7,
                8 := D8}) ->
    D0 + D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8;
simulation(D, #{0 := D0,
                1 := D1,
                2 := D2,
                3 := D3,
                4 := D4,
                5 := D5,
                6 := D6,
                7 := D7,
                8 := D8}) ->
    simulation(D - 1, #{0 => D1,
                        1 => D2,
                        2 => D3,
                        3 => D4,
                        4 => D5,
                        5 => D6,
                        6 => D0 + D7,
                        7 => D8,
                        8 => D0}).

For ease of visualization, I used maps, but it can be any data type that helps group up the data. Let's see if this works…

1> In = [5,4,3,5,1,1,2,1,2,1,3,2,3,4,5,1,2,4,3,2,5,1,4,2,1,1,2,5,
         4,4,4,1,5,4,5,2,1,2,5,5,4,1,3,1,4,2,4,2,5,1,3,5,3,2,3,1,
         1,4,5,2,4,3,1,5,5,1,3,1,3,2,2,4,1,3,4,3,3,4,1,3,4,3,4,5,
         2,1,1,1,4,5,5,1,1,3,2,4,1,2,2,2,4,1,2,5,5,1,4,5,2,4,2,1,
         5,4,1,3,4,1,2,3,1,5,1,3,4,5,4,1,4,3,3,3,5,5,1,1,5,1,5,5,
         1,5,2,1,5,1,2,3,5,5,1,3,3,1,5,3,4,3,4,3,2,5,2,1,2,5,1,1,
         1,1,5,1,1,4,3,3,5,1,1,1,4,4,1,3,3,5,5,4,3,2,1,2,2,3,4,1,
         5,4,3,1,1,5,1,4,2,3,2,2,3,4,1,3,4,1,4,3,4,3,1,3,3,1,1,4,
         1,1,1,4,5,3,1,1,2,5,2,5,1,5,3,3,1,3,5,5,1,5,4,3,1,5,1,1,
         5,5,1,1,2,5,5,5,1,1,3,2,2,3,4,5,5,2,5,4,2,1,5,1,4,4,5,4,
         4,1,2,1,1,2,3,5,5,1,3,1,4,2,3,3,1,4,1,1].
2> {ok, Pid} = eprof:start().
3> ok = eprof:log("log.analyze").
4> {ok, Result} = eprof:profile([Pid], day6, solution, [80, In]).
{ok, 350917}
5> eprof:analyze().

The output should look like this:

****** Process <0.137.0>    -- 89.99 % of profiled time *** 
FUNCTION                         CALLS        %  TIME  [uS / CALLS]
--------                         -----  -------  ----  [----------]
lists:keyfind/3                      2     0.00     0  [      0.00]
gb_trees:empty/0                     1     0.00     0  [      0.00]
gen_server:decode_msg/9              1     0.00     0  [      0.00]
gen_server:try_dispatch/3            1     0.00     0  [      0.00]
gen_server:handle_msg/6              1     0.00     0  [      0.00]
erlang:timestamp/0                   1     0.09     1  [      1.00]
erlang:trace_pattern/3               2     0.09     1  [      0.50]
erlang:ensure_tracer_module_loaded/2 2     0.09     1  [      0.50]
gen_server:try_dispatch/4            1     0.09     1  [      1.00]
gen_server:loop/7                    1     0.56     6  [      6.00]
erlang:system_flag/2                 2     9.73   105  [     52.50]
erts_internal:trace_pattern/3        2    89.34   964  [    482.00]
---------------------------------  ---  -------  ----  [----------]
Total:                              17  100.00%  1079  [     63.47]****** Process <0.141.0>    -- 10.01 % of profiled time *** 
FUNCTION           CALLS        %  TIME  [uS / CALLS]
--------           -----  -------  ----  [----------]
day6:init/1            1     0.00     0  [      0.00]
day6:solution/2        1     1.67     2  [      2.00]
day6:simulation/2     81    16.67    20  [      0.25]
day6:init/2          301    81.67    98  [      0.33]
-----------------  -----  -------  ----  [----------]
Total:               384  100.00%   120  [      0.31]

Now we can see that our new solution approach consumes less time to execute than even the profiling server. We also see that the primary time costs occur in the state initialization function, but still, just 301 function calls instead of 3827531 for 80 days as we had before. Now let's try to see if we can get the correct answer for 256 days:

1> day6:solution(256, In).                                       
1592918715629

And yeah, this is the right answer! 🤘

Summary

The profiling tool eprof is very useful, and unlike fprof (we discussed this tool in the previous article), it has a pretty friendly output and formatting. Very easy to use with extra formatting manipulations. Nevertheless, this tool provides less info than fprof.

eprof (just like fprof) is not recommended in production since these tools slow down the programs you are trying to profile to a large extent. But for debugging purposes, it's still a fantastic tool that comes out of the box with OTP.