Title image generated with SDXL

Debugging Memory Issues in Elixir Part 2

Anton Frolov
6 min readJun 27, 2024

--

Navigating the BEAM Memory Model

Following up on my previous post about memory issues you can encounter working with Elixir it is time to touch on another topic that often comes up when thinking about application memory.

In my talk on ElixirConf EU 2024 I asked two questions: Did you ever have a memory leak in your Elixir code? Can a garbage collected language even leak? Unlike languages like C++, where memory is allocated and deallocated manually, Elixir’s garbage collection mechanism ensures that memory is automatically reclaimed when objects are no longer reachable. This helps prevent your code from leaking memory in most cases, however, like other garbage-collected languages, Elixir can still experience memory retention issues if objects remain reachable. While this is not a classical memory leak, as technically the object still can be used and so its memory is not collected, it still practically is a memory leak because as a software developer, you did not intend to use these objects anymore and want memory they occupy to be reclaimed.

But first, let's take a quick look at how Elixir’s garbage collector operates.

Garbage Collection in Elixir

Elixir’s memory management relies heavily on the BEAM Virtual Machine’s garbage collection system. BEAM employs a per-process garbage collection strategy, meaning each process manages its own heap. This isolation enhances fault tolerance and simplifies garbage collection since each process can independently reclaim memory without affecting others.

The garbage collector in Elixir is designed to identify and clean up memory that is no longer in use. It categorizes memory into young and old heaps, with more frequent collection occurring in the young heap. If an object survives multiple garbage collection cycles in the young heap, it is promoted to the old heap. This generational approach optimizes the garbage collector’s performance by focusing on recently allocated memory, which is more likely to be discarded soon.

Memory Leaks in Elixir

Technically, Elixir cannot leak memory in the same way as languages with manual memory management do. In those, memory leaks happen when developers forget to free the memory allocated before. Elixir, with its garbage collected nature, automatically reclaims unreachable memory, and also frees the memory of a process and its ETS tables when processes terminate, making such leaks virtually impossible.

However, Elixir can experience memory retention issues similar to those in other garbage-collected languages like Java. If an object remains reachable, it won’t be collected by the garbage collector. This often happens in process states or ETS (Erlang Term Storage) tables, where data can linger longer than necessary. For example, if a process maintains a large map in its state where it keeps information associated with user sessions and this data is no longer needed but still referenced, it won’t be reclaimed, effectively “leaking” memory.

As we have already seen one common source of memory retention issues in Elixir is process states. Processes in Elixir can hold onto data in their state and there could be hundreds of thousands or even millions of processes in your cluster. Retaining memory is particularly problematic in long-lived processes that continuously accumulate data and live through the life of your application, such as a process managing user sessions or handling incoming messages.

Similarly, ETS tables, which can be used for storing large amounts of data accessible by multiple processes, can also be a source of memory leaks. ETS tables do not automatically remove entries, so if you forget to delete data that is no longer needed, it will persist and consume memory. Additionally, forgetting to remove ETS handlers can lead to similar problems, as the data associated with tables is not freed. Removing ETS handlers is generally a lesser concern since ETS tables are automatically deleted when the parent process terminates.

It is important to mention that another source of memory leaks can be NIFs. Native Implemented Functions or NIFs run native code that can be implemented in a language that allows memory to leak. But because there is a variety of native languages in which you can implement native functions, it is quite hard to cover all the possible cases, and so I didn’t look into it during my talk at a conference.

GC will not always collect your garbage

In Elixir, each process operates with its own heap, and garbage collection runs independently for each process. The more precise description given to garbage collector in Erlang documentation is the following:

The garbage collector Erlang uses to manage dynamic memory is a per process generational semi-space copying collector using Cheney’s copy collection algorithm together with a global large object space.

We have already seen that the Elixir garbage collector works per process, but there is another word that should draw our attention— generational. That simply means that Elixir garbage collector actually has several generations of heaps. Elixir’s generational garbage collector divides memory into a young heap and an old heap, where frequent collections occur on the young heap to quickly reclaim short-lived objects. Objects that survive multiple collections in the young heap are promoted to the old heap, which is collected less frequently to optimize performance. So the important takeaway is that Elixir garbage collector will not run on the old heap every time it runs to reclaim memory. While Elixir’s garbage collector uses heuristics to determine when to run on the old heap, these heuristics do not always ensure that memory is freed from the old heap promptly.

Garbage collector ignoring all the garbage on the shelves (old heap) by SDXL

One common scenario where memory retention can occur is when a process stores a significant amount of data in its state and frequently modifies this data. For example, a process might manage user data, including statuses and updates. Each modification creates new data structures while the old ones are discarded, generating a considerable amount of “garbage.” In case when most users are inactive for some time, their data is likely to be moved to an old heap. When changes for such users finally arrive, garbage data is left on the old heap. This can cause the old heap to grow rapidly, especially if the process state is large and changes often.

Another situation arises when a process handles a high volume of message traffic. Messages in Elixir are allocated on the process heap by default, so a process with a busy message queue can accumulate a lot of data quickly. This can lead to increased memory usage, as messages pile up in the process heap until they are processed. During the garbage collector run memory occupied by queued messages will likely be moved to the old heap.

To address these issues, developers can manually trigger a garbage collection cycle for process on all heaps using the :erlang.garbage_collect() function. This can be useful for debugging to ensure that the culprits of large memory consumption of the process are uncollected objects.

A common method to alleviate memory retention issues when storing large amounts of data for a process is to move frequently modified data to an ETS table. Since ETS tables manage their own memory, modifications or deletions do not generate any garbage that needs to be collected.

When handling a large volume of messages, it’s generally advisable to avoid letting messages accumulate in the process mailbox, as this can cause memory retention and delays in processing new messages, leading to overall system performance degradation. If preventing message pile-up isn’t feasible, consider allocating memory for messages off the process heap using the +hmqd off_heap VM argument or the :message_queue_data process flag.

Conclusion

In conclusion, while Elixir’s garbage collector effectively reclaims unreachable memory, it can still experience memory retention issues similar to other garbage-collected languages. Limited-time retentions can occur in long-lived processes that store and frequently modify large amounts of data or handle high volumes of message traffic, leading to garbage memory piling up in the old heap. Understanding these practices helps maintain efficient memory usage and optimal performance in Elixir applications.

If you want to learn about practical tools for debugging memory in Elixir applications, check this post out.

Please clap and follow for more Elixir-related stories. Until then, happy hacking!

--

--