Java memory leak investigation

6 min readNov 13, 2023

I’ve occasionally investigated memory leaks during my career with varying levels of success since it’s not a task most developers perform regularly, some guidance changes between Java versions and identifying memory leaks is a bit of a dark art. Recently I investigated a memory leak issue and decided to document the steps I followed and the tools I used.

This is not an exhaustive list or comprehensive recipe so it may not apply to you — with that disclaimer in place … Please read and support the following sources (complete links in the references section) that provide detailed information on this subject and guided my investigation.

What is a memory leak ?

A memory leak occurs when objects are created on the Java Heap memory and they are no longer used but cannot be removed from memory i.e. the garbage collector is unable to remove these objects from memory due to existing references on the objects.

The objects will build up over time, eventually the available memory will be exhausted and the the application will throw an java.lang.OutOfMemoryError and fail.

The usual suspects

Memory leaks can occur for various reasons which luckily have been well documented — below are a few of the usual suspects that I check when investigating memory leaks.

  1. java.lang.String intern() for applications pre JDK 7 (doesn’t apply to later versions).
    - Interned strings go into PermGen space and live for the duration of the application.
    - Repeatedly interning strings will fill the PermGen.
  2. new java.lang.String(“…”) called repeatedly.
    - A new distinct String object is created on the heap each time the ‘new’ keyword is used.
    - The new objects live outside of the String constant pool.
  3. static java.util.* Collections (Maps, Lists, Sets etc.).
    - Objects are added but never removed.
    - As the collections are static they are generally not readily available to the garbage collector due to the long-lived static reference.
  4. Caches.
    - Objects are added but never removed or expired.
    - Incorrectly generated object keys can cause a cache miss resulting in duplication of objects in the cache.
  5. ThreadLocal variables.
    - Incorrect cleanup of ThreadLocals or re-use/pooling of ThreadLocals can prevent garbage collection.
    - ThreadLocals are notorious for memory leaks and should be used with care.
  6. Unclosed connection or stream resources.
    - File reader/writer streams, HTTP Connections, JDBC connections and/or statements that are never closed.
  7. Incorrect or unplanned creation of new objects that should be singletons.
    - Marshallers, parsers, formatters — follow the recommendations and best practice of the relevant library i.e. single static instance vs new instance per request.
    - Spring beans configured with the incorrect scope (Prototype) — this creates a new bean per request instead of the default singleton.
  8. Web Server objects.
    - HTTP Sessions that are not correctly expired and build up.
    - If the application provides stateless REST API endpoints then ensure that it is correctly configured w.r.t. HTTP session creation behaviour.
  9. ORM bugs or misconfiguration.
    - ORMs use various caches for performance improvement which impact resource usage.
    - Check the content and number of SQL queries generated by your ORM — see https://hibernate.atlassian.net/browse/HHH-9576 as an example of unexpected ORM behaviour that generates a large number of SQLs (i.e. String objects) if not correctly configured.
  10. Outdated dependencies.
    - One the applications dependencies may have a known memory leak issue that is resolved in a later version.
    - Review the changelogs for the newer versions of your dependencies.
  11. Scheduled jobs.
    - Check for background jobs which may either be failing or not executing as expected to perform resource cleanup activities correctly.

Tools of the trade

The tools below help inspect the behaviour the memory in a JVM based application.

  1. JMX.
    - Used for management and monitoring of applications.
    - Popular frameworks, libraries and webservers (e.g. Spring Boot, Hibernate and Tomcat respectively) provide some JMX interface which may simply need to be enabled.
    - https://www.oracle.com/technical-resources/articles/javase/jmx.html
  2. JConsole.
    - JDK provided monitoring tool to view memory, CPU and JMX statistics.
    - https://docs.oracle.com/en/java/javase/17/management/using-jconsole.html#GUID-77416B38-7F15-4E35-B3D1-34BFD88350B5
  3. VisualVM.
    - Lightweight JVM profiler to view memory, CPU, threads.
    - https://visualvm.github.io/
  4. IntelliJ Profiler.
    - Application performance profiling, real-time monitoring and memory dumps.
    - https://www.jetbrains.com/help/idea/profiler-intro.html
  5. Spotbugs.
    - Static code analysis tool that can be used to identify unclosed streams.
    - https://spotbugs.readthedocs.io/en/latest/gui.html
  6. Microsoft Sysinternals suite.
    - A collection of diagnostics and troubleshooting tools to investigate application behaviour e.g. file handle per process dignostics.
    - https://learn.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite
  7. Load.
    - Some type of load is needed to try simulate the behaviour that causes the leak.
    - Apache JMeter — (Java based) : https://jmeter.apache.org/.
    - Locust — (Python based) : https://locust.io/
  8. Time and patience.
    - Memory leaks are a slow killer and require the application under review to execute for extended periods so that trends can be identified.
    - Load, inspect, adjust… repeat.

Code analysis

My first step is to analyse the code using a static code analyser like Spotbugs to identify any known bad or discouraged code practices such as unclosed streams, incorrect hash()/equals() implementations, finalize() usage etc.

I aim to understand any code which makes use of the usual suspects above and confirm that the implemented code matches the expected usage.

Any identified issues are recorded and the behaviour checked during the runtime investigation step.

Runtime investigation

After getting familiar with the code and identifying any potential code issues it’s time to see the code in action.

  1. Enable JMX monitoring in the system and any dependencies e.g. :
    - The application
    - The web server
    - The ORM
    - The connection pool library
  2. With the JMX monitoring enabled run the application under load on a non-critical environment and observe the behaviour of the resources in JConsole/VisualVM to identify any odd trends e.g. :
    - Counts that constantly increase e.g. sessions, active connections, statements in cache.
    - Counts that never increase from 0 e.g. closed sessions.
    - Configuration that is disabled (-1 setting) e.g. max size limits, timeouts.

The application

  1. First check the behaviour of the application running but without any activity for a period.
    - Periodically use a JVM profiler to take a memory snapshot — this step can identify objects created by dependencies which are not being garbage collected.
  2. Once one understands how the dependencies are behaving with the application’s business logic at rest we can add some load to see what objects the application is creating in memory.
  3. Check the caches
    - Ensure that data added to a cache is retrieved or removed from the cache as expected.
    - Incorrectly configured cache keys cause a cache miss and data is then duplicated in the cache under different keys.
    - Check that caches are set with correct expiration times/rules.
    - Setting breakpoints in cache loading methods can help identify when the methods are called and if they are being called unexpectedly due to cache misses.
  4. View the file handles of the Java application process in process explorer to view the trend of open file handles.

The web server

  • Observe the behaviour of the web server resources and caches.
  • Constantly increasing resources such as HTTP sessions can indicate misconfiguration.
  • If your application is a stateless REST API then consider alternatives to using HTTP sessions.
  • If sessions are required in the application then ensure that they are correctly configured to expire after a reasonable time.
Unhealthy Tomcat active and expired HTTP session counts
Healthy Tomcat active and expired HTTP session counts

The ORM

  • If an ORM is used then enable any provided statistics and observe the ORM resources over time to identify negative trends .
  • A continuously increasing statement count can be an indicator of a bug or misconfiguration (e.g. https://hibernate.atlassian.net/browse/HHH-9576).
  • A increasing high cache miss count can be an indicator.
  • The screenshots below show unhealthy and healthy trends for the same business logic executed under load. After correcting the ORM configuration the number of generated queries and query plan miss counts stabilised at 115 and 119 respectively instead of constantly increasing.
Unhealthy query and cache hit-vs-miss count
Healthy query and cache hit-vs-miss counts

Final thought..

I’d appreciate hearing any tips you have or tools you use for identifying memory leaks.

Thank you for your time.

References

--

--

Chris Braithwaite
Chris Braithwaite

Written by Chris Braithwaite

South African software developer. Interested in DevOps, cloud development and continuous improvement.

No responses yet