Java ThreadLocal — The Boon and The Bane

Published in

Javarevisited

7 min readJan 31, 2020

Recently, me and my team were working on doing some performance testing on certain piece of code (specifically an HTTP endpoint), which was largely inherited, but also had some additions of our own.

To ensure, that the overall latency of the operation meets the SLA requirements with the new additions, we started performing a load test using Jmeter.

The inherited code would also asynchronously publish some messages to our in house message bus for the message to be consumed by other systems as well as for the analytics pipeline. The high-level flow looked something like this.

Basic high level flow of message publish. Other irrelevant details omitted for brevity.

The message publishing library was built using the Stomp protocol and was managed by a different team.

The message producer (shown above) was a shared instance and was safe to be shared across multiple threads, and one instance per process (Producer established a physical TCP connection to the message broker) was sufficient.

Snippet 1. Single instance of Message Publisher used from within the endpoint controller code.

When the load on this endpoint was increased to match the expected additional production load, we started observing degradation in performance.

It is always a good practice to introspect a Java process by looking at the health of its threads to get a good indication of any issues in the system. JVisualVM is a useful tool to get insight into the process of health.

Note the threads *default-dispatcher-*. These are the threads tasked with publishing of the message. However the red colour blocks in those thread timelines, indicate that they are waiting a lot of time on an object monitor (synchronized blocks, exclusive reentrant locks in the Java world).

Green areas indicate threads in running condition.
Red areas indicate threads waiting on locks/conditions and not doing any useful work.
Ideally such red blocks should be avoided as they severely impact the application’s concurrency as well as the overall performance.

This also had an adverse impact on the heap size due to pileup of messages in the thread queues.

Cause

Upon taking thread dumps, we realized that this library was internally using streaming/blocking I/O which meant that the stream could not be safely shared across threads.

This entailed locking the stream for reads/writes thereby serializing access to it by multiple threads.

To mitigate this issue, we had the following options.

The 2020 Java Developer RoadMap

Hello guys, first of all, I wish you a very Happy New Year 2020. I have been sharing a lot of roadmaps to become a Web…

javarevisited.blogspot.com

Producer per message

Create a new producer (effectively establish a new connection to the message broker) for every message and close upon use.

Pros

Easy to implement.
No sharing between threads which meant no contention between threads.

Cons

Every connection establishment would have to pay the cost of a TCP handshake (making it an expensive proposition).
This would directly impact the performance and latency would go higher again causing message pileups as the publish rate would always lag behind the message creation rate.

Producer per Thread

This is where we thought of exploring the excellent Java ThreadLocal. ThreadLocals are essentially independent copies of state precluding any sharing of state (variables) between them, as each thread has exclusive access to its copy of the state maintained by a static ThreadLocal.

The code in Snippet 1 above was modified to look like

Snippet 2 : Using ThreadLocal for the producer.

The changed visual vm snapshot looked far better as there were no more red blocks and it stalled the slow memory growth. Further, the number of threads (managed by the application container) was limited, so the effective number of additional connections was well within acceptable limits.

Note the dispatcher threads sans monitor blocks.

To our shock, the physical connections made to the message broker went through the roof after few hours of going live with this fix.

Note the rise in connections from 6–8K (normal) to 36K

This rise in physical connections to the broker had an adverse impact on other applications which relied on the message broker infrastructure.

Sample application which had latency spikes.

Why did this happen?

Container managed thread pools usually have an idle timeout, which means threads become idle after certain period of inactivity and the container managed thread pool discards them. This is a very plausible scenario in practice where traffic varies during the course of the day.

However, even though the threads were discarded, the threadlocal maintaining the message bus producer (which mapped to a physical connection) was still open.

This clearly meant that physical connections would pile up as the thread pool via its thread factory would request new threads to handle future requests.

Physical connections on the server would take much longer to close (mostly lying in a CLOSE_WAIT state).

ThreadLocals do NOT give any opportunity to cleanup resources and this can cause very serious ill effects like the one above.

Lesson

Wrapping physical resources inside thread locals should be avoided when using container managed dynamic thread pools as the absence of cleanup opportunity can leave those resources in a dangling state possibly exhausting shared resource quota.
If you are using this to cache any such connections to cloud message brokers, Databases, reconsider your choices.
Unless you have full control over your thread pools, and are willing to pay the price of keeping idle threads running, it is best to avoid using ThreadLocal for caching expensive resources.

What Java Programmers should learn in 2020?

Useful Tools, technologies, framework, and libraries Java programmers can learn in 2020

medium.com

Potential Fixes

However, we still needed to fix this problem. I will present two solutions here.

Producer per Actor

This is the solution (fix) we implemented for our usecase courtesy my colleague Mahendra Chhimwal who proposed this.

Our application makes heavy usage of the excellent Actor model, and its worthy implementation Akka for Java/Scala.

Why does this work?

Actors are software abstractions unlike physical resources like threads which are usually managed by application containers.
Actors take negligible amount of memory and other resources, so maintaining such a pool of Actors has negligible cost.
The internal state of the Actor cannot be accessed outside the actor giving the benefit of threadlocal.
The Actor pool used was totally in the control of the application and could be run without having any idle timeouts which was the main reason for the failure of the ThreadLocal approach.

Producer Pool

Applications which do not want to bring in the Actor framework only for this usecase, can implement a pool for their expensive resources. Usually we see pool implementations for HTTP connections, DB connections, LDAP connections and so on.

If you do not have a readily available pool implementation, you can write one with very little effort using the well known Apache Commons Pool Library.

The pool size can be tuned by your application (depending on the load) and it can ensure that a pooled resource would be accessed only by a single thread at a time thus avoiding any contention between threads.

Conclusion

ThreadLocals are an excellent construct available in the standard JDK and allow a multi-threaded application to scale well.
However, one should be very careful with the resources managed within the threadlocal.
Avoid : objects which map to physical resources outside the boundary of the JVM (especially for managed threadpools). Examples of such resources : DB connections, file handles, network sockets etc.
Good candidates : Context/Session objects which require to be accessed at different places within the application, or any objects which are safe to have per thread.
ThreadLocals is used for the right usecases with the right objects can be a boon, whereas they can seriously impact your application or any dependency system in serious ways.

Resources

Credits

Thanks a lot to Mahendra Chhimmwal for helping me with the charts needed for this article. Thanks also to Gargi Dasgupta for providing valuable feedback.

Other Java Concurrency Articles you may like

The 2020 Java Developer RoadMap (roadmap)
10 Java Multithreading and Concurrency Best Practices (article)
Top 50 Multithreading and Concurrency Questions in Java (questions)
Top 5 Books to Master Concurrency in Java (books)
Difference between CyclicBarrier and CountDownLatch in Java? (answer)
How to avoid deadlock in Java? (answer)
Understanding the flow of data and code in Java program (answer)
Is Java Concurrency in Practice still valid in 2020 (answer)
How to do inter-thread communication in Java using wait-notify? (answer)
10 Tips to become a better Java Developer in 2020 (tips)
5 Courses to Learn Java Multithreading in-depth (courses)

Thanks for reading this article so far. If you like this article or thread interview questions, please share it with your friends and colleagues. If you have any questions or feedback, then please drop a note.