Drop hit ratio as cache measure, now!

Published in

Pipedrive R&D Blog

6 min readOct 18, 2022

Cache is about making a positive, perceptible impact in your application, not about complicating it.

Let me ask you: What is the main purpose of a cache? Wouldn't you agree it's a bit bizarre that the most common measure used for cache performance doesn't take time into consideration despite the fact a cache is meant to speed up applications and save time?

Introduction to Hit Ratio

When talking about cache metrics, the industry-standard metric has become hit ratio that's obtained by hits/(hits+misses). This ratio gives you an idea of the number of times you have served content from the cache. As almost all cache systems have this metric implemented, and as it's easy to measure, everyone uses it.

But is it a reliable indicator for time-saving? The answer is no. While it's a reliable indicator for overall value, it doesn't mean you'll be saving time. In fact, it might slow you down if cache retrieval is slow, even with a 99.99% hit ratio.

Caches were created to provide information at high speed and low cost, which is measured by time. In my experience, cache hit ratio metric fails in two instances: small gains with high throughputs and big gains with small throughputs. In both cases, the hit ratio metric isn’t a reliable indicator of the cache’s actual value.

Instead — and to estimate your time saving –, you need to compare the time it usually takes you to obtain data (cache miss) with the time it takes you to retrieve it from cache (cache hit). The resulting value is the amount of time you save per request. When adding the number of requests to the equation, the result may show a different picture.

Time-saved ratio

Let me introduce you to the time-saved ratio, a more stable and accurate metric than hit ratio. Its core is average hit time (AHT), the time you take to get it from cache and average miss time (AMT), the time it takes to generate your data.

With this, we can calculate the average time saved (ATS) by subtracting from the average miss time the average request time. This value indicates the average time saved per request:

To calculate time-saved ratio (TSR), we need to divide average time saved (ATS) by average miss time (AMT).

Note that the value from this formula already incorporates frequency into the math, allowing it to shine where hit ratio would fail.

Here are some examples:

Let’s look at row #1: 2 requests in total, 1 hit, 1 miss, and a hit ratio of 0.5 (ideal hit ratios should be above 90%). Since the average time saved is high, we are saving almost 50% of our time with a single hit. This case fits big gains but a small throughput case.

Now, if we look at row #2 hit ratio 0.95, we might be tempted to believe it’s an awesome cache result where, in fact, we saved even less time than in row #1: only 0.475 or 47.5%. How is that even possible? The answer is frequency: We didn’t save much time in each request, but we had 950k hits. This case fits small gains but a high throughput case.

I believe that in these examples, the hit ratio fails, but the time-saved ratio is pivotal, providing a stable and effective metric of your cache benefit in your application. Not to mention that to implement these metrics we need to measure hits and misses time.

The hard part of implementing this kind of metrics is that they work well in a single instance, but things become much more complicated when you have n instances to collect that data from. This is especially challenging given the instance that generates a cache isn't always the one to benefit from it, causing situations where you can have hits without having any misses, breaking our math.

Measuring locally:

in-process-metrics.ts

Run it with:

ts-node in-process-metrics.ts

The above code simulates a hit and a miss with a 0.5 hit ratio, but its time-saved ratio states that we saved 40% of the time. Let's do the math: A miss takes on average 5ms (AMT), but with cache it takes 1ms (AHT). This means we save 4ms in each request:

TSR = (5–1)*1 / (5*2) = 4/10 ~= 0.4

This means our “get-customer” operation would be 40% slower if no cache were used. This example is relatively straightforward as it all happens within a single service memory, making it easy to calculate the average hit and miss times.

Notice also that performance gain (AMT/AHT) shows that a hit is 5.23 times faster than a miss which is a great improvement introduced by this cache.

Measuring multiple instances

Imagine you have n instances of the same service, and hits might occur in different instances than the misses, which might trigger inconsistent stats. In such a case, what can we do to ensure our metrics are working correctly?

The answer is simple: We need to collect these metrics from all instances and combine them.

collect-distributed.ts

To run the above code, we will need to start this container to start a Redis container (btw, read our blog post about how the Redis lock can be used excessively yet successfully to achieve more than seems feasible at first):

docker run --name collect-metrics-redis -p 6379:6379 -d redis

This will provide us with a Redis stream to act as a communication layer. We can use it to broadcast a message to all instances and request them to send metrics data for all operations.

Now, you just need to run:

ts-node collect-distributed.ts

But to properly test this code, you should run it multiple instances simultaneously, so you can witness data collected from different instances in action.

In this example, we are collecting data from two different operations. Each service collects data from multiple instances — and as each instance collects the data every 5s — I recommend collecting data every minute to not overload the system.

Conclusion

It's time to abandon hit-ratio as a cache success measure and embrace new ones since — and by not taking frequency and time into account — it doesn't allow you to accurately estimate the effectiveness of your cache. Instead, it might mislead you into thinking it has a bigger impact than it actually does.

Also, we should start taking cache implementations more seriously as professionals and stop looking for quick fixes to expedite applications.

Poorly implemented caches can slow down your application, crash it, and prevent it from scaling correctly. I have witnessed countless debates, ad-hoc implementations, and bugs and do feel we all should give more attention to caches if we want to take the most out of them.

Happy coding!

Resource Links: