Fastest and Efficient LRU Cache in Java

5 min readMay 18, 2020

Introduction

In this post, I want to experiment and talk about an efficient LRU cache implementation in Java to hold primitive data types for specific use cases. Primitive data types like int, boolean, byte, etc. are used on many performance intensive paths. Inside a hot loop, savings mere nanoseconds can be a huge benefit. For ex: an integer(int) representing an index of an object in an array and a boolean representing an expensive operation result on the object. To cache such expensive operation results, LRU cache is usually a popular solution.

A simple LRU cache can be built on top of LinkedHashMap, as explained here. But it suffers from auto-boxing, unnecessary objects creation, garbage collection cycles etc. Another similar implementation of LRU cache that uses a HashMap and a Doubly Linked List as explained here also suffers from those problems, but it gives us flexibility to optimize. On the later implementation, I used three optimization strategies to get up to 30–40% improvement on cache gets operation and 50–70% improvement on cache puts operation, they are:

1) Pack data in memory as closely as possible
2) Combine Map and Doubly Linked List functionality
2) Reduce Garbage Collection time

Below are the graphs that show time taken per get/put operation in nanoseconds measured using JMH benchmarks.IntIntLRUCache is the optimized cache implementation from this article. IntEntryViewLRUCache and IntEntryLRUCache are the next best implementations and IntIntLinkedHashMapAsCache is the basic implementation on top of LinkedHashMap.

Note: IntIntLinkedHashMapAsCache benchmarks suffer from a lot of noise when compared to other implementations, mostly due to lot of garbage collection. For ex: noise stats look like 619.890 ± 637.581 ns/op at 25M cache size.

Basic Implementation Drawbacks

An LRU Cache based on HashMap and Doubly Linked List works this way: For a given key, the HashMap is used to locate the entry containing the key’s value. That entry is part of Doubly Linked List that stores the order of values accessed, so that a least recently used (LRU) item can be evicted when cache is full.

There are various drawbacks in this design:

Data is present in independent Entry objects that are located in different places in memory and away from HashMap.
HashMap in Java doesn’t use a CPU cache friendly collision resolution algorithm.
Entry objects are created on demand causing a lot of garbage.

Due to these drawbacks, the basic implementation gets worse as the cache size is increased.

Optimized LRU cache:

Let’s establish some constraints that will allow us to optimize for certain use cases easily. Remember that these constraints can each be resolved independently if necessary with some extra effort.

Primitive data type ‘int’ values are used as keys and values in the cache. Keys are always greater than or equal to 0, for ex: array indices.
Cache size including load factor is less than Integer.MAX_VALUE.
LRU cache is not thread-safe data structure.

The highlights of optimized implementation are:

Data is laid out in memory as closely as possible on an integer array. Map and Doubly Linked List are not two different data structures anymore. A custom map implementation allows built-in doubly linked list functionality on key-value and left-right data that is next to each other.
Uses linear probing, a CPU cache friendly map collisions resolution algorithm. On key collisions, next entry is already available on cache line. Each entry is written across 4 integers (16 bytes), so a 64B cache line read makes 4 consecutive entries available for CPU.
Lowest memory usage of all. Java Objects are not used to store data, so we don’t have object alignment padding creeping into the storage.

The data structure for the optimized cache implementation looks like this:

The code for this cache is present here, see IntIntLRUCache.java. Map and LRU cache operations are both handled in put(key, value), get(key) and remove(key) functions. The most interesting piece of code is the put operation, link here. Pseudocode for put(key, value) is:

void put(int key, int value) {
  while (true) {
    if key exists at current position {
      update the value;
      move the entry to the tail end;
      return;
    } else {
      if cache is full {
        evict LRU item;
        shift keys;
        find empty slot;
        add key-value to cache;
        move the empty slot to tail end;
        return;
      } else {
        add key-value to cache;
        move the empty slot to tail end;
        return;
      }
    }
   move current position forward;
  }
}

Another interesting things is: since the map uses linear probing technique, a key removal requires subsequent keys in the array to be shifted back to their appropriate probe locations and fill the hole created by removal. This looks like it would be an expensive operation but given the load factor and a good hash function, it is reasonably fast.

It’s really cool to see how packing data compactly and lowering memory usage can significantly improve the performance. The optimized version is a highly customized implementation for int-int key-values, but it can be easily adjusted to support short-short, long-long, int-long pairs etc. If you are willing to compromise slightly on the performance, an easier way to get something better than basic implementation is by reducing garbage collection cycles and using primitive collections wherever possible. See IntEntryLRUCache for example. Garbage collection is optimized by allocating all the required objects at the time of cache creation and re-using them through out the cache lifetime. Koloboke collections can help avoid auto-boxing of keys and deal with primitives directly. Instead of creating heavyweight Entry objects, another implementation IntEntryViewLRUCache packs data more compactly by using an Entry like view on an integer array that holds key, value, left and right elements. I would like to thank this article for making it easy to understand and extend map implementation to support LRU cache functionality.

That’s it, let me know if you can find any other ways to improve the efficiency.

Fastest and Efficient LRU Cache in Java

Introduction

Basic Implementation Drawbacks

Optimized LRU cache:

Written by Uday Sagar Shiramshetty