How we use Redis (and especially its managed AWS integration) for real world web application

Eyal Flato
NI Tech Blog
Published in
7 min readOct 28, 2018

It was late 2015, and I was on a rare business trip to Boston. We had an HTML widget that was adapted by more and more sites with quite a few page-views. We had the servers running and strong enough to handle everything, until the crisis hit. Suddenly many errors appeared, the servers got stuck, auto scaling started to scale up and instead of 2 strong servers that served everything smoothly, in a moment, 8 servers were up and didn’t succeed to serve all traffic. And the traffic hadn’t even increased by that much.

The problem? There were too many requests to the mongo database, which didn’t serve the requests fast enough — so everything got stuck.

Before you ask — we were using some levels of cache, but it wasn’t enough.

Here’s a general description of the system:

As shown in the diagram below, the user searches for a certain service, wants to compare few providers or is looking for reviews. We use information about the user provided by the search engine, like keywords used, current time, location, etc. to find the most relevant services, create a web page and display it to the user.

We use an operational database (including CMS) on MongoDB, statistics and data warehouse on Redshift and 3rd party APIs (for example, geo services, services availability, 3rd party content, reviews from Google, etc.)

Before Redis

Why We Use Redis

With our type and amount of data, everything on the server side, like sorting lists, rendering web pages from templates, searching, etc. happens fast — even if it’s not implemented perfectly and has bad complexity. What takes a lot of time is getting the data from outside: DB, APIs — and that’s what we needed to cache.

With auto-scaling schemes, we don’t know how many servers there are and how many processes in each one. It’s much easier to assume no state — preferably, the cache should not be on the server’s own memory.

This is why we choose Redis.

With Redis

While Redis implementation in AWS requires API calls, they are very fast and the fact that everything is in the same AWS region make latency of these calls negligent. It makes Redis a perfect match for the task.

Comparing Performance

To prove that Redis will match our performance needs, we used the following real-life (but simple) task. We implemented 4 solutions and compared query time performance. Here are the results:

Our system needs to resolve geo targets from Adwords campaign (See details: https://developers.google.com/adwords/api/docs/appendix/geotargeting). Data has about 100,000 records. We get a Criteria ID and should return the geographic details related to it.

Here are the performance results for loading the geo-targeting records into the DB and query it:

We can see that in-memory cache has the most efficient query time. The problem is that we have to reload the data to the memory every time the server process starts and once for every process (a 4 CPUs server runs 4 processes, for example). Redshift is very efficient in loading the data — it is done one time, fast, and last forever. But the query time is slow. While MongoDB implementation is not slow at all, we found out that loading MongoDB with too many concurrent queries might reduce its performance dramatically. Redis provides a balanced solution — query time is fast enough — less than 0.5 msec per query. Although loading the data takes long time, it does not occupy the main server’s CPU or RAM and can be done by an unloaded worker process, once a week, for example.

(Note: All DBs are on the same AWS region as the server. The field we query is indexed).

Have you been in a similar situation before? What did you do?

Some use cases for Redis in our implementation:

  1. Caching layer for Mongo objects

We have few objects that are fetched a lot of times from MongoDB (like settings items, for example). Instead of fetching them from the DB each time, we use the below implementation to lazy-load the objects to cache when first needed. After that, each query is done against Redis and therefore — very fast.
When an object is changed by an API call, we make sure we call touchObject to force its cache refresh. This is very efficient comparing to in-memory cache approach — in which we should somehow message any process on any server that the object has changed (or not doing it and use a non-fresh version of the object). See the code here: https://gist.github.com/cappsool/7d2de0531e156ef942d778a426248b9e

2. Complex fetching

We have a code that fetches data from several MongoDB complex queries. We don’t want it to run on every user request, of course. We use Redis to cache these data fetches and use one of two approaches:

Approach 1 — lazy loading: if the key does not exist in Redis, go to the DB, run the queries, do whatever calculation and restructuring needed and save to Redis.
Pros:
- Single code for all cases
Cons:
- First user gets bad performance
- Serving might access the DB sometimes which may slow everything

Approach 2 — preload: a worker process is making the DB queries every few minutes and saves the results to Redis for every possible mix of parameters.
Pros:
- Serving doesn’t approach DB at all
Cons:
- What should be done if the key doesn’t exist in Redis?
- A lot of memory may be used for rare/unused cases

We use Approach 1, Lazy loading, most of the times. It is good enough for queries that takes no more than 1sec.

3. Heavy calculation/analytics (> 1sec)

Once every hour or so (depending on when the data tend to change) we perform a heavy calculation that includes analytics DB query, and loads its results to the cache. The background server performs the heavy calculation (can take 10s of seconds, for example). The serving code only loads the calculation result from Redis and continues from there.
Serving process is looking for the relevant key in Redis and has default behaviour if it is not found.

Things to consider:

  1. Add a way to view/reset the cache from an API or management UI:
    Sometimes you make a change and want it to take effect immediately without waiting for next load cycle. In addition, while debugging an issue, it can be helpful to see what’s in the cache. There are few Redis UIs that could be worth looking into. See, for example, https://redislabs.com/blog/so-youre-looking-for-the-redis-gui/, but implementing a simple solution on your server is easy enough.
  2. Assume No Persistence — Redis is a great key-value DB, but we don’t store data there in order to keep it for long time. This way:
    * We can assume it can be reset or deleted.
    * There should be no problem with using another Redis for another region.
    * There is no problem if some keys can be purged if their expiry arrived.
    * This is an in-memory DB and RAM is expensive and limited to the amount we are willing to pay.
  3. Expiry time (TTL): Use TTL wisely on every key you store. Save RAM on unused data. If you are lazy-loading the data to the cache — it is a great way to force refresh every few minutes/hours — depends on the data freshness you need.
    The following code saves the object to Redis and sets up TTL of 3 hours. After 3 hours the key will not be there anymore and the next one to fetch it will have to set it again — with a fresh data:
    client.setex(key, 60 * 60 * 3, JSON.stringify(obj), next);
  4. Count cache hits and cache misses — It can help to understand where the bottlenecks are in your implementation
  5. Understand what’s using redis memory. Make sure you don’t over-use the cache memory and enlarge or optimize if needed. Redis will always let you store and query data. But if its storage is full, the oldest data will be purged to free space for newer data. Beware!
  6. AWS implementation also provides a very easy to manage solution, see https://aws.amazon.com/elasticache/. You only have to choose the size and performance needed and set up the endpoint on your servers. The price for cache.m4.large with 6.5GB RAM, for example, is less than $250/month for 2-nodes, multi AZ, solution. If you assume no persistence and make sure to load fresh data to cache (timed or lazy-loading), you can save a lot on synchronizing, backup maintenance and setup.
  7. Use on local machine is not possible with the AWS implementation — use redis-mock (https://github.com/yeahoffline/redis-mock)
    As opposed to a DB to which you can connect while developing on the local machine, with AWS Redis it is impossible. See an example of setting up a Redis client in the code that supports both local and server implementation:

Implementing these solutions, we were able to eliminate almost completely the need for our serving code to access the operational and analytics DBs. The pressure on these DBs reduced and they were able to perform the other queries faster. Latency on end-user request reduced dramatically. We were able to server x10 user requests with the same 2 servers that we started with, and scale up even more.

Conclusion

Caching with Redis is a well-integrated and balanced solution. It is slower than naive in-memory cache but saves that hassle on rebuilding the cache for every process every time it starts. A single worker process can build the cache once in a (long) while and the data is available for all servers all the time. Redis query time is quite fast. Although it includes asynchronous call to a different server, the performance is great.

--

--