Debugging Memory Issues in Elixir
Navigating the BEAM Memory Model
ElixirConf EU 2024 just finished and amidst a gathering of passionate developers and enthusiasts, I had the privilege of delivering a talk that delved into the world of memory management in Elixir. I decided to capture its contents and some other ideas I have in this article, so here we go.
Elixir’s utilization of the BEAM Virtual Machine brings with it a unique approach to memory management, designed to alleviate common pitfalls that developers may encounter. However, as I emphasized during the talk, navigating memory issues within Elixir applications can be a daunting task without a deep comprehension of the underlying BEAM memory model. Rather than delving into the intricate implementations of BEAM memory management, the focus was on pragmatic approaches to resolving memory issues effectively.
In this article, inspired by the insights shared during the ElixirConf EU talk, we embark on a journey to unravel some mysteries of memory management in Elixir. By drawing upon the foundational understanding of application memory management principles and the practical examples presented during the talk, we aim to equip developers with the knowledge and skills needed to optimize memory usage and build resilient Elixir applications. Let’s delve deeper into the fascinating world of memory management on the BEAM Virtual Machine and uncover strategies for effective memory issue resolution in Elixir.
Elixir’s Unique Landscape
Elixir, a functional programming language built on the Erlang VM, offers a unique landscape for developers. With its lightweight processes, immutable data structures, and built-in concurrency primitives, Elixir enables the development of highly concurrent and fault-tolerant systems. However, this power comes with its own set of challenges, particularly concerning memory management.
At the heart of Elixir lies the BEAM Virtual Machine, renowned for its reliability and effectiveness when it comes to concurrency. Unlike traditional memory models, BEAM employs a process-centric approach, where each process owns its memory space, including heap and stack. This isolation ensures that no concurrent modifications of data can occur and errors in one process do not affect others, enhancing the system’s fault tolerance. This does not completely save your system from fault due to out-of-memory errors though as one process is still able to consume all the memory available in your machine, leading to a termination of BEAM Virtual Machine or exit of your docker container.
Pillars of BEAM Memory Management
To comprehend memory management in Elixir, one must understand its foundational principles. Memory in Elixir is immutable, meaning that any modification results in the creation of a new value, old value is often thrown away to be garbage collected later. Processes manage their memory independently, with garbage collection occurring on a per-process basis. Additionally, there are ETS tables that manage their memory by hand and do not use a Garbage Collector. ETS tables have an owner process and are deleted when it terminates, making them another memory resource that the process owns.
Despite Elixir’s elegance, developers may encounter memory-related challenges, particularly when dealing with large datasets or intensive computation.
Lets look at an example
But first meet Choosy, your food companion on a mission to revolutionize meal planning. At Choosy, we believe that eating well should be easy, affordable, and environmentally friendly. That’s why we specialize in creating personalized weekly meal plans tailored to your specific goals, whether it’s improving your health, reducing your carbon footprint, or sticking to a tight budget.
Powered by Elixir and Phoenix, Choosy boasts a robust backend infrastructure capable of handling a multitude of simultaneous connections and asynchronous traffic. This means that even when several household members are planning meals simultaneously from their devices, Choosy keeps everything running smoothly.
Let’s now take a look at a simplified data schema that is inspired by what we have in Choosy.
A Recipe is a basic building block of a meal plan, containing instructions on how you are going to cook your dish. Product is an abstract item you can use for cooking, think “tomatoes” or “pasta”. Recipes and Products have a many_to_many relation through the RecipeIngredient table, which also adds information about the amount and unit. The RecipeIngredient is “two tomatoes” and “300 grams of pasta”. We have 1000 recipes, but only 100 Products, those are not exact numbers of what we have in Choosy, but the relation is similar. Each Recipe has from 5 to 20 ingredients, so there are going to be around 12,000 RecipeIngredient entries. Note that the Product is a comparably bigger structure with a lot of information. In code, we can describe this relation with Ecto.
Now let’s create a simple endpoint that queries all recipes and preloads all ingredients for Recipe and related Product for RecipeIngredient. Endpoint doesn’t even return anything as it won’t matter for this example. It sends all recipes to a function called suggest_user_recipes() that will use AI to create a most fitting meal plan for a family.
Let’s call this endpoint and track the memory it consumes with :observer.
You can observe it only consumes 20 MB on a process heap, this number doesn’t tell us anything yet, let’s just remember it as a base of how much memory a request takes.
Fetching all recipes each time is putting unnecessary pressure on the database, it would be better to cache the result as Recipes do not change as often. We add cache with Nebulex using the local backend. This means that our data will be cached in the ETS table (with the compression flag turned on I believe).
Call the endpoint some more times and observe quite unexpected results:
The first request after the cache was enabled still takes only 20 MB of heap memory, but also 55 MB is allocated in ETS for the cache. It already should alert seasoned developers as usually one would expect memory consumption to roughly double when cache is enabled. But worse happens later when we do a second request that already takes data from the cache. We can observe that the request now occupies 100 MB of heap memory! The reason why the same data went from 20 MB to 100 MB is due to the so-called “loss of sharing”.
Loss of sharing
Remember we mentioned that each Elixir process has the memory that it manages and it means no data is being shared between processes. That also implies that all data you send to another process is being fully copied, and this is also true when you write data to or read it from the ETS table. But likely this is something you already know, as this is one of the basic qualities of both Elixir and Erlang. The one detail you might not know is that during this full copying, your data is being flattened and any sharing of terms inside it is being lost. You can protest and point out that before we said that data in Elixir is not shared between processes. This still holds, but the thing is it is quite actively shared within the process. Think about when you have one variable where you assigned a list and then you append one element to the head of this list and assign it to another variable. The first variable points to a list, that at the same time is a tail of another list referenced by another variable. So this tail is shared between the first variable and the second variable (through a new list) and remains the same memory allocation. Going back to our example remember we had around 12,000 RecipeIngredients and only 100 Products? When preloading Products Ecto will load these 100 Products and then share them between 12,000 RecipeIngredient entries. When full copying happens for those Products new data allocation will happen every time BEAM sees a Product, even if it was already seen and copied before. Hence you will end up with 12,000 Product entries after the full copy, and this is what flattening or “loss of sharing” is.
Conclusion
In conclusion, knowing how Elixir manages its memory is crucial to avoid unexpected memory spikes on your nodes or find a culprit quickly when they occur. By understanding the underlying principles of the BEAM memory model and familiarising themselves with common challenges and optimization techniques involved, developers can unlock the full potential of Elixir for building scalable and fault-tolerant applications.
We have seen one topic I covered in my talk on ElixirConf EU, if you liked it be sure to clap and follow for new articles, I am going to write about another Elixir memory-related topic and will cover some tools you can use to debug memory issues arising in your cluster. Until then, happy hacking!
Edit: part 2 is available here.