Cache things up

Ievgen Mykolenko
reachnow-tech
Published in
4 min readMay 19, 2020

Phil Karlton once said: “There are only two hard things in Computer Science: cache invalidation and naming things.” This quote reminded us of the need to have a cache first before you can consider to invalidate it. This is a story about how we dive deep in caching infrastructure, used primarily by our ride hailing experts.

One task that exchanges quite some data between services is a “M to N” coordinate matrix evaluation. Each cell of a matrix is represented by a pair of coordinates, which are processed by a time aware function. Despite the huge amount of calculations to be done, only a complete calculated matrix is valuable as the end result.

Our challenge was that during regular high traffic intervals some services become fragile. In particular being part of cascading failure caused by clients upstream and downstream services behaviour.

In this post we take a look at what optimizations can be applied to provide better results in terms of quality and latency using less computing power. Starting from review of clients behavior to applying various caches.

End-to-end data flows

Incoming request contains 2 vectors of coordinates (a matrix) where every combination of coordinates (a cell) should be evaluated via several levels of external function calls. In the nutshell every incoming request causes m*n amount of outgoing requests, which itself creates another x amount of remote API calls. The only good news so far is that segment calculations can be both parallelized and batched. Complete matrix can be marked as finished after having all results combined together.

Generally speaking we have here 3 layers of processing, where things may go wrong mainly due to external calls.

Traffic review is key to cache strategy

We started out of s situation with every instance of the service running its own cache locally. Having cache running locally has tradeoffs. One upside: it is extremely fast to access data. Some cache implementations, like Akka http has, put Future[T] of a result directly into the cache, making duplicated requests to wait for the same calculation to be completed (https://doc.akka.io/docs/akka-http/current/common/caching.html). A downside is: every service instance needs to warm up cache first. Scaling the service becomes a challenge due to that. A shared cache approach is simpler to scale and monitor. AWS ElastiCache Redis enables easy scaling in our case.

We learned that analysing incoming traffic is the first thing you shall do when considering shared caches We consulted the related AWS Load Balancer access logs about request urls of our service and corresponding timestamps. How to set up access logs recording for AWS Load Balancer can be found here: Access Logs for Your Application Load Balancer — Elastic Load Balancing.

The results were impressive, ~70% of urls on short time intervals were not unique, and therefore could be served by simply introducing front cache. Additionally same coordinates are present in different combinations inside incoming matrices.

The cache approach that covers most of the cases above is to cache coordinate pairs. Such a coordinate pair matrix can be quite huge. Given that ~70% of the incoming matrices were seen within last seconds, we decided to keep complete matrix cache as level 1 and implement single coordinate pair processing result as cache level 2. See CPU cache for more details in cache levels. Level 3, results of segment function, should be switched to a shared model.

A shared cache across multiple service instances allows reusing existing calculation results. Not only complete matrices are valuable results, even better are fast served complete matrices.
Note: It’s important to note the distinction between a latency cache versus a capacity cache: when a latency cache is employed, the service can sustain its expected load with an empty cache, but a service using a capacity cache cannot sustain its expected load under an empty cache. https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/

We further improved the caching strategy in a way that

  • Level 1 caches the complete response json
  • Level 2 caches the result of a single coordinate pair processing
  • Level 3 caches the result of a segment function processing

With the implementation approaches outline, we experience 75% of traffic decrease on downstream services during peak traffic hours 🎉

We also recognise the cache used rather often based on our cache hit rate.

Conclusion

Caching is one option to reduce latency. With the appropriate approach for a given challenge, the effect can be quite significant. Our case resulted in a reduction of 75% on traffic, justifying the engineering efforts invested in the project.

--

--