Optimising memory consumption on a rails monolith
Around last December, our application servers were using ~70 GB of memory to serve requests at peak hours. As part of platform team, I started investigating ways to bring down our memory usage and discovered jemalloc.
In this article, I’ll be sharing the root cause for our memory bloat and how jemalloc affected the overall memory utilisation.
Memory Allocators
Memory allocation in Ruby involves three layers, ordered from high to low level:
- The Ruby interpreter manages Ruby objects.
- The operating system’s memory allocator library.
- The kernel.
Ruby interpreter
On the Ruby side, Ruby organises objects in memory areas called Ruby heap pages. Such a Ruby heap page is split into equal-sized slots, where one object occupies one slot. Whether it’s a string, hash table, array, class, or whatever, it occupies one slot
Operation System’s Memory Allocator
The operating system’s memory allocator is a library that is part of glibc (the C runtime). It has a simple API:
- Memory is allocated by calling
malloc(size)
. You pass it the number of bytes you want to allocate, and it returns either the address of the allocation or an error. - Allocated memory is freed by calling
free(address)
.
Kernel
The kernel can only allocate memory in units of 4 KB. One such 4 KB unit is called a page. Not to be confused with Ruby heap pages, which again have nothing to do with this.
The reason for this is complicated, but suffice to say that all modern kernels have this property.
Allocating memory via the kernel also has a significant performance impact, so memory allocators try to minimize the number of kernel calls.
Memory issue due to Memory fragmentation
What is memory fragmentation?
Imagine the heap
as your Lego box. It's where Ruby objects are allocated and stored.
- Over time, your app creates and destroys objects, leaving behind “holes” in the heap like missing Lego pieces.
- These holes are small, scattered chunks of memory that can’t be used for larger objects.
- When your app needs a big block of memory (e.g., to load a large image), it has to scan through the fragmented heap, potentially taking longer and even failing if no suitable chunk is found.
Mitigating Memory fragmentation
Reducing Avenues
The major cause of fragmentation appears to be the large number of glibc memory arenas in heavily multi-threaded programs. “Heavily multi-threaded” — sound familiar? That’s Sidekiq. That reduces the “heavily multi-threaded” trigger and leads to less bloat.
Change memory allocator
Change the default memory allocator from glibc to a different one ( like Jemalloc)
Jemalloc vs glibc
Jemalloc in action
The results were very impressive. In fact, the outcome was much better than we expected.
As seen, we were able to bring our peak memory consumption to ~20 GB.