Case Study: Choosing Local Cache Over Distributed

IAS Tech Blog
IAS Tech Blog

--

By Karishma Agarwal

Integral Ad Science (IAS) processes trillions of data requests globally each month, with several mapping tables in MySQL to manage these events in real-time. We typically use a distributed cache such as Redis or Aerospike to keep this data in memory and perform real-time lookups with very high throughput. However, we recently worked on a project to evaluate two new approaches:

  • Using a local in-memory cache: ConcurrentHashMap
  • Using a remote cache: Amazon ElastiCache for Redis

We wanted to add a cache to the existing pipeline for storing data from the mapping tables in MySQL and use it to retrieve the integer value of a corresponding string value; this would minimize the queries on the MySQL table. In this blog, we’ll share the benchmarks and other key factors for both approaches that helped us decide on the best one.

The Dataset & Implementation Strategy

In MySQL, there are several mapping tables, each with a string and an integer field. Performing lookups for every request results in a few billion daily. Meanwhile, the mapping tables’ size including overhead is approximately 5MB and will continue to increase. With a few hundred daily inserts, the size is expected to reach 25MB in five years.

A small Redis cluster with cache.t3.small configuration and one read replica in a different availability zone was provisioned using the CDK for these size requirements. We opted for the Jedis java library and then used hashes, among the variety of data structures supported by Amazon ElastiCache for Redis, since this project required a key-value data store.

The cache was populated once during application startup and updated whenever there was a cache miss. There were two possible cases during lookup:

  • Data was present in the cache i.e., Cache hit — value will be returned.
  • Data was not present in the cache i.e., Cache miss — insert query will be executed on the MySQL table, the value will be returned, and the cache will be updated. Consistency during concurrent inserts is provided by strong transactional guarantees of MySQL.

The Benchmarking Process & Key Observations

In this experiment, benchmarking was critical to ensure that the additional billion lookups did not have a significant impact after adding the caching mechanism. To benchmark the two approaches, we sent 100K and 500K requests to the application, at least three times each, and calculated the time to send the requests and for the data pipeline to process them. We compared these statistics with the results of performing benchmarking on the original data pipeline without an added caching system to ensure processing time was not affected. The following table outlines the results of our test:

The lookups with ConcurrentHashMap implementation did not have any performance impact on the original pipeline, whereas the Redis implementation was nearly 50% slower. The observed throughput with Redis was slower than expected. To investigate the cause for this performance difference, we benchmarked Redis hget and hset operations in isolation. Without adding it to the pipeline, it showed much faster performance. We concluded that the Redis implementation was impacted by a network bottleneck because all the tasks performed in this example were network bound, such as reading data from a remote source, writing to a remote source, and performing a certain number of lookups per request. With ConcurrentHashMap, the lookups required CPU and memory, not network, so there was no impact on the pipeline processing time.

Reviewing the Results

In this experiment, local cache performed well and distributed cache was an overkill, due to the following factors:

  • The cache data size was small.
  • There were no updates and very few inserts.
  • Other tasks were already heavy on the network and adding more network calls created a bottleneck.

If you are working with a very large cache data size and a cache miss can be an expensive operation, using a remote cache or a hybrid cache could be a more suitable solution.

*Specific to the above-mentioned use case

Ultimately, after benchmarking these results and key factors, such as data growth and maintenance, ConcurrentHashMap was best suited for our project. At IAS, we continue to experiment with technologies to constantly enhance how we process data, deliver insights for our customers, and provide quality results.

--

--