How to add a Redis caching layer to your Elasticsearch queries
Written by Clement Ruffinoni, Hugo SCHOCH & Ludovic Francois
Problem: Time wasted for TrackIt Users Due to Redundant ES Requests
TrackIt app users were experiencing poor performance due to slow data processing from multiple back-end Elasticsearch requests triggered to access the same information. To solve this problem, our team decided to introduce a caching layer. The primary purpose of a caching system here was to eliminate redundant back-end ES queries which returned the same result each time they were triggered.
The TrackIt App: Cloud Cost Optimization
A vast majority of companies using the Cloud are spending a lot more than they should have to. Estimates have shown that 35% of the average cloud customer’s bill is wasted spend. In fact, cloud customers themselves estimate that they are overspending by about 30%.
The TrackIt app is a cloud cost and resource optimization platform that allows companies to optimize their ROI on AWS. TrackIt allows users to monitor their cloud deployments, identify trends, detect spending anomalies, and ultimately optimize their spending on AWS.
Solution: Redis
The use of cache is quite similar to that of human short-term memory; a small amount of recently accessed “fresh” data can be retrieved faster than old “memories” that have to be searched through in a slower manner to be retrieved
For the implementation of this caching feature, we opted to use Redis — an in-memory data structure store used as a database.
Organization
The cache is organized according to a very simple scheme:
- Interception of the user’s request
- Return the data from the cache if it is there
- Otherwise, recover the data from the datastore (Elasticsearch) that will be sent and store it in a cache
Life span
- The front-end may order the cache to be deleted when sending a request using a header provided for this purpose. In this case, the cache is deleted if it exists and a new instance will be created at the end of the request.
- The API tasks modify the data regularly, making the data contained within the caches obsolete. As a result, these redundant caches are then all automatically deleted.
- The data stored in the cache expires automatically after 24 hours hence allowing the creation of a new cache once the same key (route + arguments + one of the AWS accounts) is triggered.
Code Sample 1: Creating the Key
- The first function we used formatKey helps us create a key for each specific cache. This function is crucial in the steps that follow since the key is needed for most of the Redis operations.
- rdCache contains all the necessary information required to create the key (the route, its arguments, and the AWS account IDs).
- The second function parseRouteFromUrl simply helps us retrieve the route and its arguments if they exist.
Click here to access the GitHub link for this function.
Code Sample 2: Creating the User Cache
- The function createUserCache creates a cache using rdCache (containing the route, its arguments, and the AWS account IDs), data (the content that needs to be stored in the cache), and logger that helps log errors.
- The function first checks if a cache already exists for the specific key contained within rdCache. If the cache already exists, the API logs a warning saying “The user has already a cache attributed for the current route.”
- If the cache doesn’t exist, the function json.Marshal transforms the content contained in data into a format that is suitable for caching. The cached content is stored in cacheContent.
- Once this is done, the cached content (cacheContent) is appended to the key which is added to Redis.
Click here to access the GitHub link for this function.
- This sample of code performs the majority of the cache-related operations:
- Deleting a cache if an instruction to do so is given.
- Checking if there’s a cache that already exists for the user’s request and returning the data contained within the cache if it exists.
- Creating a new cache if no prior cache exists.
- updateCacheByHeaderStatus checks if the instruction to delete the cache has been given in the header of the HTTP request. If this instruction has been given, the cache is deleted using the deleteUserCache function. If no instructions are given or the instruction given is unknown, nothing happens.
- The code then checks if the user has a cache for this request. If this is the case, the code will try to retrieve the cache in question. If an error occurs during the recovery, the cache is deleted so that it can be recreated later. If no errors occur, the data recovered from the cache is returned directly to the user.
- If the user does not have a cache, then the cache system lets the code run normally and retrieves the final result. After that if, and only if, there is no error, a new cache is created using the createUserCache function seen above.
Click here to access the GitHub link for this function.
Challenges Encountered
The challenges we encountered were predominantly associated with the code architecture of the feature.
Challenge #1 — Finding the appropriate key format
In Redis, each cache has a unique key that allows it to be identified. Our first challenge was to choose the most suited key format for this feature.
The key format we chose is as follows: “ROUTE-ARGUMENTS-AWS_IDENTITIES-”.
Each part of the key (ROUTE and ARGUMENTS), except the AWS-IDENTITIES was hashed using md5.
Challenge #2 — When API tasks make all the content associated with the different caches obsolete, how to find these caches and delete them?
This bigger challenge governed our choice of key format. With our key format in place (“ROUTE-ARGUMENTS-AWS_IDENTITIES-”), the ROUTE, ARGUMENTS, and AWS-IDENTITIES are separated, and the AWS-IDENTITIES are the only ones left unencrypted. The ARGUMENTS of the ROUTE, as well as the different AWS-IDENTITIES, are sorted in alphabetical (or numerical) order. This allows us to easily search through the database and delete obsolete caches.
Results
Cache speed table comparison
The table below shows a detailed comparison in speed for three different requests with and without the caching system in place. If you’re interested in gaining a better understanding of the various routes and arguments used in TrackIt’s app, use this document as a reference: https://docs.trackit.io/.
Creation: This column represents the time taken by the previous version of the TrackIt app to access the required information without the caching system.
- Deletion + Creation: This column represents the time taken for the deletion of a cache + creation of a new cache.
- Cache utilization: This column represents the time taken for a user to access the required information a second time. With the new caching system in place, the cache is returned by the API instead of having to re-process the information with Elasticsearch.
A Boost In Performance
The caching upgrade resulted in an average of 4:38 min in time saved for a single TrackIt user during navigation. Every time a user accesses a page in TrackIt’s app (which typically involves the triggering of multiple ES requests), new caches are created allowing the user to benefit from better performance by gaining direct access to the cached content during subsequent visits.
This project has been open-sourced at https://github.com/trackit/trackit.
Feel free to contact us at team@trackit.io