EHCache Object Management Optimizations in Promise Engine

Gaurav Gupta
Myntra Engineering
Published in
3 min readJan 23, 2024

Authors: Gaurav Gupta, Mohammed Abdul Azeem

Valuable Contributions: Relix Johnrose, Gurudutt S

In the previous post of the series, we covered GC parameter optimizations in our application and their benefits. We talked about the eager eviction of EHCache elements from the heap. Here we will cover that in detail and discuss other optimizations which were tried around EHCache.

The Young Generation and Old Generation defined by the G1GC algorithm will be referred to as below:

Young Generation: YG

Old Generation: OG

Eager eviction of EHCache elements

EHCache elements are allocated on application heap memory and get lazily evicted only when necessary (either through LFU eviction when the EHCache size limit is reached or when an expired element is accessed). These objects survive YG collection cycles and eventually move to OG.

EHCache exposes a method called evictExpiredElements, which does exactly what the name says. A polling thread was created in the application that runs in the background and calls this method every few minutes to clear expired elements. This thread clears objects of all the caches which are present in memory.

This reduced the OG buildup drastically and GC cycles were more relaxed.

Reduce heap space taken by EHCache

OG build-up and GC spikes caused by EHCache are standard problems as GCs are not built to handle medium-term in-memory cache objects. Terracotta offers a solution called BigMemoryGo with EHCache 2.10, which has an off-heap tier with memory management of its own.

BigMemoryGo was experimented with three different configs.

Combination 1: 4GB off-heap storage and 100MB on-heap

Latency spikes were observed for this combination and application throughput dropped. Latency grew as moving things off-heap to on-heap involves deserialization and is expensive. 75% of the cache calls resulted in page faults and got swapped with new objects after deserialization from off-heap.

4GB off-heap/100MB on-heap results

Combination 2: 4GB off-heap storage and 400MB on-heap

On-heap memory was increased to 400MB. On-heap had enough memory to hold all cached objects used in our tests, so it was almost the same as not using off-heap.

Combination 3: 4GB off-heap

One suspect for the increase in latency for combination 1 could be too many page faults and swapping of elements. To rule this out, one trial was done only with an off-heap tier. Latency shot up to 1500ms and throughput couldn’t cross 16K RPM.

Based on the results it was clear that deserialization was very expensive and off-heap was not an option. We decided to stick with our current model of storing cache on-heap.

Refresh-ahead for EHCache

Since EHCache is not a blocking cache, at a high scale whenever a cache entry expires, there could be hundreds of threads requesting it causing a thundering herd situation. Refreshing just before expiry could tackle the above problem and should reduce the latency for highly queried combinations(hotkeys) around the time of expiry.

EHCache’s Refresh Ahead strategy was implemented with a 30s refresh window before expiry. After this change, p99.9 was bounded under high load, and random spikes were gone. Also, there were some gains in p99.

p99.9 spikes bounded with refresh ahead

This helped us in improving the hit ratio for the caches as well.

Conclusion

When using an in-memory cache solution we should be aware that object allocation happens in application heap memory and it can be a major contributor to heap behavior. Cache object management on the heap becomes critical and should be optimized so that application latencies can be kept in check. Also, with the refresh ahead strategy, the cache hit ratio can be improved and downstream calls to get the data can be scattered over the timeline instead of getting it at a specific time window.

References

www.ehcache.org

--

--