Tools to debug Memory issues in Elixir

6 min readSep 20, 2024

After diving into memory issues in Elixir in my previous post, I had the chance to give a talk at the Curiosum Elixir meetup. Since I had already covered much of the same material at ElixirConf EU 2024, I decided to take a different approach and focus on the tools that can help you debug memory issues happening on Elixir nodes. I showcased the tools I personally use, along with one that came highly recommended by the community. Using scenarios similar to those from my ElixirConf EU talk, and previous posts on Debugging Memory Issues in Elixir (parts 1 and 2), I explored practical ways to tackle memory problems. In this continuation, I’ll guide you through the tools available for debugging memory in Elixir and how to use them effectively.

Setup

I am running my Phoenix application locally and making a call to an endpoint to trigger some behaviors we are going to debug. I use Postman for making a http call and I started my server with:

iex --sname dev_app -S mix phx.server

So first of all we will connect to our running application:

iex --remsh dev_app --sname rm_observer

When connecting to a remote cluster, the common approach is to first SSH into a machine running one of your nodes, then use iex --remsh from there.

Observer

Observer is a great tool that gives you an overview of your node health and allows you to find processes by traversing supervisor tree or sorting processes by some attribute (ex. memory). You can start Observer with following command:

:observer.start()

You can see on the first screenshot the memory is growing, that happens when I call an endpoint. On a second screenshot you can see some process occupying around 16 Mb. Its initial function can already tell you what kind of process it is, but you can investigate it further. After pinpointing the process you can check info about it or even get its state.

Here are examples of process information and state taken for other processes, you can see in depth what kind of process it is and what functions it is running.

You also can look through ETS tables and some other things.

Here you can spot Nebulex backend occupying around 55 Mb of space. Nebulex is a caching library, so that should be your caches.

To use Observer you need to install wxWidgets to your system (ex. I installed it from brew on Mac) and add :runtime_tools and :wx to :extra_applications in mix.exs.

Observer CLI

The main problem with Observer (aside from having to install wxWidgets) is that if you want to debug a remote node you need to connect Observer to it. To achieve that you have to:

Know and use the same Cookie that your remote cluster is using
Forward epmd ports from remote to local

In case you don’t want or can’t use GUI Observer, Observer CLI is a nice alternative that runs in iex console. You can start it with

:observer_cli.start()

You can do a lot of things similar to Observer with Observer CLI, like check node status, sort processes by some attributes and getting process info and state. On a screenshot we see the same Bandit.DelegatingHandler we seen with Observer before.

Process info and state are there.

Also ETS tables overview and other things are available.

It is very convenient that all you need to use Observer CLI is just add it as a dependency to your project in mix.exs! Don’t forget to make sure you also include it to your production deployments.

Recon

The very useful tool to debug your code in depth is a library called Recon. Recon is a set of useful functions that make debugging Elixir (or Erlang) code easier. Let’s go through basic functionality related to memory it has.

Node status:

:recon_alloc.memory(:allocated) — memory allocated by BEAM machine
:recon_alloc.memory(:used) — memory that is actually used
:recon_alloc.memory(:allocated_types) — allocated memory by allocator (ex. processes stacks, ETS, binaries)
:recon_alloc.memory(:allocated_types, :max) — :recon_alloc.memory can accept second argument, if :max it will return historical maximum value

:recon.proc_count/2

:recon.proc_count/2 will list processes sorted by some attribute:

:recon.proc_count(:memory, 3) — by total memory process occupies (stack + heaps + internal structures)
:recon.proc_count(:binary_memory, 3) — by memory occupied by binaries (can be useful to track down processes holding a lot of binary memory)
:recon.proc_count(:message_queue_len, 3) — by size of process message queue

The cool thing about Recon functions is that they return normal Elixir structures, so you can continue your debugging with data returned. Ex. capturing the process id of process that occupies the most memory and retrieving general info about it:

[{pid, _, _} | _] = :recon.proc_count(:memory, 3)
:recon.info(pid)

ETS tables overview

You can use undocumented ETS function :ets.i() to get an overview of all ETS tables. It is a bit hard to read though.

:ets.i()

Memory leak and retention

If you suspect a process is leaking or retaining memory and you have its PID (e.g., using :recon.proc_count/2), a simple way to check if its memory can be reclaimed but isn't being collected is to run :erlang.garbage_collect/1 and then monitor the process memory afterward.

:erlang.garbage_collect(pid)
:recon.info(pid, :memory)

After comparing memory occupied before and after GC you can drive conclusion if for some reason memory was just not reclaimed or the process actually holds to it. In the second case you can debug the process state deeper to see if it is leaking or just really needs to hold a lot of data. As garbage collection in Elixir happens for each process separately, you usually don’t risk performance of your system running GC on one process.

Binary memory retention

You can quickly verify if some processes retain references to binaries by calling :recon.bin_leak/1

:recon.bin_leak(5)

It will count binary references in each process, run garbage collector for it and count again. Then it returns a number of binary references freed for each process. Please be aware that :recon.bin_leak/1 will run GC for every process, which can hurt your system performance.

:recon.info and its alternatives

:recon.info is very similar to build-in :erlang.process_info. Both can be called with one argument to return general info about process and with two arguments to get specific attribute:

:recon.info(pid)
:recon.info(pid, :memory)
:recon.info(pid, :message_queue_len)
:erlang.process_info(pid)
:erlang.process_info(pid, :memory)
:erlang.process_info(pid, :message_queue_len)

The reason behind Recon introducing :recon.info is that :erlang.process_info/1 can return some verbose data that can affect performance of your system (ex. due to having to transfer a lot of data over the wire). So it is more preferable to use :recon.info that will try to avoid returning excessive data. The same is relevant to :sys.get_status/1 that returns process state.

:sys.get_status(pid)

That can often be replaced with:

:recon.info(pid, :status)
:recon.info(pid, :current_stacktrace)
:recon.info(pid, :initial_call)

Conclusion

In conclusion, debugging memory issues in Elixir requires a solid understanding of the BEAM memory model and the right tools for the job. By identifying problematic processes, manually triggering garbage collection, and utilising tools like Observer (CLI) and Recon, you can effectively diagnose and resolve memory problems. Mastering these techniques will help you maintain optimal performance in your Elixir applications.

Please clap and follow for more Elixir-related stories. Until then, happy hacking!