Programming

Investigating memory leak in Scala application— Beginner’s guide

Where to start? What tools would help?

6 min readAug 27, 2023

Recently, I worked on a task to fix a memory leak in one of our applications written in Scala. It was my first time tackling such an issue, and I found it quite challenging.

I had many questions swirling in my mind: Where should I start? What should I look for? How do I inspect memory usage on a remote application running in the production system?

I couldn’t find a straightforward guide online to help me answer these questions, so I turned to my colleague for guidance. With their help, I successfully resolved the memory leak 😃. That’s why I decided to document my experience in this blog post.

While this post covers theoretical concepts and provides a high-level guide, my latest post delves into practical applications with real-world examples.

How to determine if there is a memory leak?

When your application experiences a memory leak, memory utilization increases over time (maybe a day or a week). Memory consumption depends on the severity of the leaked objects.

We are collecting system metrics to monitor memory utilization and have a watcher that informs us when it's above 85%.

This way we came to know that our application is experience a memory leak.

Bottom line — Make sure you really have a memory leak before diving in.

What should I do now if there is a memory leak?

My setup — Our application is running on AWS EC2 cloud. It’s a docterized application.

Attach visualvm to your live application — Created by author

The first step is to attach a profiler (such as VisualVM) to your live application experiencing a memory leak.

Step 1 — Start your application with JVM parameter

We need to run our application with the following JVM parameter.

-Dcom.sun.management.jmxremote.rmi.port=9010 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=9010 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.local.only=false 
-Djava.rmi.server.hostname=localhost

Don’t know what I'm talking about 😃 … No worries. Here is a nice stackoverflow post that explains it very well.

Step 2 — Create an SSH tunnel

Connect to EC2 instance via SSH
Find the internal container IP address.

docker inspect \
  -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' \
  <container_name>

Disconnect the previously connected EC2 instance (or close the terminal).
Open a new terminal and type the following command.

ssh -L 9010:<internal-container-ip>:9010 \
  -i ~/.ssh/valid/key ec2-user@<your-ec2-instance>

This forwards the port 9010 on IP via the SSH tunnel to a port 9010 on your local machine.

Step 3 — Attach visualvm

Ideally, visualvm detects it automatically and displays your remote process in the sidebar menu. But, if that is not the case, you can follow the following steps.

Open visualvm and add the JMX connection.
Right-click Local -> Add JMX Connect -> localhost:9010

You can now visualize remote application using this tool.

To get the first overview, check the distribution of Heap, Stack (number of threads) and Metaspace to narrow down the optimization paths.

In the most cases, heap will consume by far the most memory, as shown in the picture below.

High memory utilization — Heap utilized almost all the memory

Step 4 — Collect samples using memory sampler

Next step is collect memory samples and analyze what’s going on behind the scene.

It will provide you 2 views:

Heap histogram.
Per thread allocations.

Use heap histogram and filter for classes or packages that has been written by you for instance com.mycompamy.api or de.myawesome.app. Now, sort those filtered objects using live objects.

Sorting memory samples using live objects.

In my opinion, count gives better overview of memory utilization as opposed to bytes because high (or recurring) number of objects with smaller size could pollute memory more as opposed to small (or non-recurring) number of objects with big size.

Also, it would get more interesting if count keep increasing. This might be an indicator of memory leak already.

Step 5 — Critical thinking and educated guess

By now, you have a high level picture of your system. You now have to combine this knowledge with codebase and try to make an educated guess for potential memory leak.

If this leak is introduced recently, review merged pull requests (with peers) and look for any potential culprits such as large data structures, unclosed resources, excessive object creation, or misuse of memory-intensive libraries.

If profiler didn’t help, what’s next? — Heap dump analysis

If profiler didn’t help, you can take a heap dump during the application’s problematic state.

Heap dumps provide a snapshot of the memory’s current state, allowing you to analyze the objects and references that might be causing the leak.

You can also use visualvm for analyzing heap dump. Unfortunately, my JVM does not allow me to do so. Hence, I have to manually perform heapdump.

How to anaylze heap dump?

If you’re proactively investigating a memory leak and want to manually dump heap, then …

(!!! Important) You need to run your application using JDK as opposed to JRE.
Connect to your remote machine using SSH.

ssh your-user@your-instance-ip

Run the following command inside the container.

docker exec -it <container-id> \
  jmap -dump:live,format=b,file=/tmp/dump.hprof <PID>

where
<PID> is the process id your application. Visualvm can show you this PID.

Copy dump.hprof from container to your local machine.

Now, you can use Eclipse MAT (Memory Analyzer Tool) for analysis. I find the following YouTube video very helpful in this regard.

> [!WARNING]
> You need to ensure that there is enough disk space on your remote machine.
> A heap dump can be as big as the full RAM of your instance.

Tip — Automatically creating a heapdump

The following options allow you to automatically create a heapdump. If you are running a containerized application, make sure to mount volume from container to host so that you don’t look heap dump.

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/your/path/to/dump

Things to consider before diving deep into memory leak investigation

Only start investigation when memory utilization is considerably high (say above 60%) so that you can see the problem.
Check whether setting the same min & maximum heap size resolves memory issue (Source).
Please don’t over-provision head memory. For instance, if a system has only 1 GB, don’t set max heap size as 1GB as this would lead to OOM as well. Try to set reasonable values and have enough room of memory for other process running on the same machine.

Conclusion

And there you have it, a beginner-friendly guide to tracking down memory leaks in your code.

Identifying a memory leak could be really challenging. There is no one-size-fits-all solution. Remember, it’s all about patience, persistence, and a sprinkle of detective work.

The most important thing I learned during this whole task is that you need to have a good understanding of your codebase (context). This context will really help you to investigate the root cause.

Happy coding, and may your memory be forever leak-free! 🕵️‍♂️💻

If you enjoy this article, you might also enjoy the following: