In this article, we would talk about how we scaled our inventory services to handle ~100M inventory cache reads. Inventory services at Myntra are expected to handle millions of API hits and a subset of these calls gets translated to ~100M inventory lookup hits on our underlying application layer, with that being the goal, we redesigned our core inventory services to handle the expected scale with a very low resource footprint. We will talk in detail about our initial approach and how we finally arrived at the proposed solution by leveraging the near cache functionality offered by Hazelcast.
What are Myntra’s inventory systems and what do they do?
Like in most other retail supply chain systems, Myntra’s inventory systems play a critical role in maintaining the inventory data that is always available, accurate, and accessible in a reliable manner. These systems are directly responsible for ensuring that Myntra shows the right products to the customers based on the product’s inventory availability and makes sure that there is no underselling and overselling. We do this by keeping track of the total inventory available at any given time and the total number of orders taken against it along with a bunch of other details to accurately determine if there is any inventory available for a given product. The inventory is stored at different granularities such as locations, sellers, etc., and also at an aggregate level, i.e total inventory available for a given product across sellers and locations.
The inventory sub-system has multiple microservices to block/read/write the inventory data and offers all other supporting functionality related to inventory management. Currently, the inventory data is not directly accessed by the order-taking flows(customer order placement path) as the inventory data is mirrored in the user-facing system’s central cache through an async pipeline with an exception of inventory blocking for which there are APIs that are hit in the customer order taking flow.
In the current setup, all of the inventory data is stored in a MySql cluster with master-slave replication. We have a single master and multiple salves for redundancy, all the write goto master and reads are distributed among the slaves with an async replication between MySql nodes. There is no sharding so the data is not partitioned.
The service layer has a bunch of java based microservices each having a separate schema in MySql. For this discussion, we would like to focus on the inventory promise engine that today offers two functionalities -
- Blocking the inventory for order taking
- Pushes inventory changes to the central cache service
Central Cache Service
Central cache service is a host of a lot of other data fetched from multiple source systems (inventory data is one among them). It is primarily used by the storefront (User facing tier 1 services) for different use cases. It uses Redis as a backing data store. Even though it is called a cache, there is no fallback on to the source systems when there is a data miss and it is almost treated as a persistent store.
Whenever there is a change in inventory data, the inventory promise engine pushes a notification event(async) to the central cache service, the cache service invalidates the inventory and re-fetches it from the inventory database by making a read inventory call to the inventory promise engine. We have employed different mechanisms to make sure the inventory database and the central cache service are in sync and are free from any discrepancies. With this design, the read throughput of the inventory promise is only limited by the rate at which the inventory changes which is much lower when compared to the actual inventory reads made on the central cache service. To be more precise, whenever a user searches for products on Myntra, the inventory check happens at the central cache service level and the core inventory services are untouched. It’s only when the user places an order, we make a call to the inventory promise engine to block the inventory.
Why was this setup this way?
Historically we have been using centralized cache service as a single source for different types of data. Data from different sources arrive here and the application layer stitches all of this data together for rendering the search and product detail pages on the Myntra platform. Inventory systems also followed the same pattern and inventory data is one among the few other types of data that are stored in the central cache.
Why do we want to change this?
The primary reason why we want to change this is not because of the read throughput as the cache service backed by Redis is quite capable of handling the expected scale. We want to redesign this to eliminate the cache layer altogether for inventory reads and scale the core inventory service(promise engine) to directly handle the user-facing inventory reads (~100M cache hits per min.). Following are some of the reasons why we need to remove the central cache layer for inventory reads -
- As the inventory data is propagated to the central cache layer in an async way, there is a possibility of momentary inconsistencies creating issues, these issues are more prominent during sale days where there is a sudden spike in inventory changes from the sellers, one of the biggest issues is overselling.
- There is an operational cost involved as the data is stored at two places, we need to make sure data is consistent, etc.
- Whenever there is a change in the data model in the inventory database, the same need to be changed at the central cache service. This is a big maintenance issue given that both are handled by two completely different teams. Every change needs to be communicated explicitly.
- Central cache service is becoming a single point of failure
So for the reasons mentioned above we decided to remove the dependency on the central cache service and make the inventory system to power the inventory reads for user-facing flows. What this means is that scaling the inventory promise engine from less than a few thousand inventory reads to ~100M hits per min. at the inventory cache.
Scale the inventory promise engine to directly handle the inventory reads(~100M hits per min. at the inventory cache) and eliminate the need for pushing the inventory reads to the central cache.
Initial Approach (Iteration 1)
Our first attempt was to look at scaling out the MySql as it was the biggest bottleneck, the service layer is already horizontally scalable so if MySql can support the ~100M inventory reads per min., with small improvements in the service layer we thought we could handle the expected load. We looked at the following changes at MySql and service layer -
- Add more read slaves in the MySql cluster
- Enable sharding
But then we quickly realized that this will create the following issues -
- A lot of resources as MySql is not ideal datastore for such a high throughput
- Add more slaves means added replication lag causing momentary inconsistencies in data,
- Maintenance and operational overhead as we are looking at so many slaves and monitoring their replication delays.
- The complexity involved in application-managed MySql sharding such as adding / removal of shards, query routing, etc.
Out of all this, the biggest challenge was the number of resources required. This approach might have been inevitable and we would have explored further on this if the requirement is to also scale inventory writes. But as the writes are very minimal and a single master can handle the current and projected load, we are currently only interested in scaling the reads. So we decided to move on and find a better approach.
Let’s Cache — Again? (Iteration 2)
From the previous iteration we figured that scaling MySql is not a good idea, but then we can’t also replace MySql with something else as we needed a highly reliable persistent store with strong ACID properties for storing the inventory data, MySql was the defacto choice and we would like to retain it. So we decided to add some sort of caching layer on top of it(MySql) just for reads as our aim was to scale only the reads while the write throughput stays in tens of thousands.
We have introduced Redis as a cache and decided that we update it synchronously whenever there is an update in the inventory data. The idea was to use MySql for writes and make a parallel write to Redis while Redis powers all the inventory reads. Redis was our de facto choice when it comes to caching as it is used extensively at Myntra for various use-cases.
How is this different from the current setup?
It may seem like this approach is the same as the current setup — Central cache powered by Redis, however, if we put aside the fact that we also want to use Redis, this setup is a bit different and solves the problems we talked about -
- In the case of the central cache, we were using the caching service that is shared by multiple teams and we have no control over it
- The central cache used async notifications to consume changes, we can’t change this if we decide to propagate changes synchronously.
- Having our own cache means there is a better chance of keeping the data model consistent
- The central cache can no longer become a single point of failure as we can have a fallback if we have our own cache
It is fair to say this approach is nothing but bringing the central cache into the inventory systems for deeper integration and better control, but then since it has other data that can’t be owned by inventory systems, we figured that having our own central cache kind of setup is a better approach.
With this in mind, we did a quick POC and performance benchmark to only figure that this approach works fine and we are able to achieve the scale we are looking at, but then this doesn’t seem to be the optimal approach for the following reasons -
- We required a ~ 80 to 100 node Redis cluster to handle the scale of ~100M RPM. This is so much for the given data size of ~ 10Gigs.
- Underutilization of resources as each node handles a tiny amount of data.
- Operability / Maintenance overhead as we are looking at a huge Redis cluster
Even though we felt that we need to rethink this approach of adding an external caching layer, we proceeded to explore two other caching solutions with the hope that we can improve on the resource utilization and bring down the total infra requirement. Along with Redis, we have also explored Hazelcast and Aerospike.
When we benchmarked all three by setting them up as standalone clusters, we saw comparable performance for our use case. There were some differences in the numbers but not enough to pick one over the other for our use case. The payload size for each key was a maximum of 2500 Bytes and an average of 1500 Bytes. Also, there were different settings used for benchmarking:
Redis only supports benchmarking for a cluster-based setup starting from Redis 6. The default Redis benchmark utility (redis-benchmark) also doesn’t provide a way to run the benchmark on custom data in contrast to Aerospike.
Application Embedded Cache (Iteration 3)
As we concluded that in the previous iteration that with a standalone cache we would be needing huge resources, we decided to move on from the idea of having an idea of a standalone cache to an embedded cache as the data size is very small and we could comfortably fit the entire data in the memory of the application. This approach is super efficient in terms of resource utilization as we are keeping the data within the application memory, this drastically improves the throughput and latencies. However, there are few challenges -
- Every time we restart the node, we need to warm up the cache thereby increasing the application startup time
- We need to keep all the nodes in sync with the MySql
Even with a distributed — shared — replicated embedded cache we see the following issues -
- Multi-node failure can cause data loss required us to re-build cache from MySql
- Every time we restart the node, we need to warm up the cache(node-specific shards) thereby increasing the application startup time
As restarting application services is a routine activity and the frequency is quite high we feel that application startup time is a big concern.
Near Cache for Rescue (Iteration 4)
Even though there are different ways to solve the problems mentioned in the above iteration, we wanted a cleaner and easy approach with less complexity so we decided to explore further and figured a hybrid approach where we will have a standalone cache cluster with client level caching(more like an embedded cache). To be more specific, we would like to have a cache system where the data is kept in a standalone cache cluster but also cached at the client side for faster retrieval. By pushing the data closer to the application proactively / reactively we not only improve the throughout dramatically but also improves the overall availability and resiliency of the cache.
This approach many advantages -
- We don’t have to set up a huge standalone cluster as most of the data is already cached within the application and the standalone cache is just a fallback
- The application can be restarted anytime without the worry of data loss and doesn’t need to warm up the cache at the start
- The frequency of re-build of cache from MySql will be less as the standalone cache cluster is used as a fallback
- The network chatter between application and cache cluster nodes is very less
With this approach in mind, we started hunting for caching solutions that offer the above functionality, we quickly found that this exists in both Hazelcast and Redis in the name of Near Cache.
What exactly is Near Cache?
Near cache effectively means creating a local copy of data at the client-side whenever data is read the first time from the cache cluster. This omits the network trips between the cache cluster and the app nodes and hence reduced latency as data is being read from the application’s local cache in subsequent calls.
Near Cache support in Hazelcast
We found that Hazelcast has great support for near cache and our benchmarks and POCs proved it to be quite reliable. Near cache support in Redis seems to be currently available with only one client type and overall we feel that Hazelcast is a clear winner in providing excellent support for it. It offers the following features out of the box -
- Both client and cache server-side support (Near caching among cache cluster nodes)
- Automatic client-side invalidation
- Limit on number of near cache entries
- Preloading of cache
Here is a snapshot of configuration options available in Hazelcast for the near cache -
More details here — https://docs.hazelcast.com/imdg/4.2/performance/near-cache.html
With this hybrid approach of standalone cache with near cache support of Hazelcast, we were able to cut down the resource requirement by 80% as we were pretty much serving the inventory data right out of application nodes, the standalone Hazelcast cluster is just a fallback and very minimal. Also, we found out that the Hazelcast community license was good enough for our use case and are using the same in production.
Data Consistency (MySql → Hazelcast)
While the above architecture with near caching at the client-side and a stand-alone Hazelcast cluster as a fallback worked beautifully with the least amount of resources, there is still one more problem we needed to solve wrt data consistency between the source(MySql) and the cache. Since we are pushing the data to Hazelcast in an async way after committing the changes in MySql, there is always a possibility of discrepancies for varying reasons ranging from the way we deployed the cache to failures in publishing or consuming the event. While the momentary discrepancies are automatically resolved by employing retires on failures, any prolonged discrepancies need to be addressed proactively. We have figured the following two approaches to solve this issue -
- User-case triggered reconciliation — Certain scenarios can be used as trigger points for automatically reconciling the specific inventory data, for example, let’s say, a new order is created but we figured that there is no inventory available, this is a good indicator to re-sync that specific product’s inventory as if the cache has the actual inventory picture the order should not have been created in the first place.
- Incremental reconciliation — We would run a periodic job that incrementally reconciles data between MySql and the cache. The job will be pretty lightweight as it reconciles the data from the last checkpoint, as our data volumes are pretty low, we can run this job every few min. without any issues.
By employing the above two approaches we can pretty much lower the possibility of discrepancies to almost zero. Even in the worst-case scenario, a momentary discrepancy is not going to be a big issue as the discrepancy impacting the customer experience/revenue is again a rare possibility due to the fact a lot of parameters have to be aligned exactly at the same time — Inventory becoming zero / available + discrepancy arising at that time for the same product + user trying to place the order at the same for that same product is a rare possibility.
Scaling the service layer — Final Piece of the Puzzle
We initially thought we would plug Hazelcast with the current service layer (promise engine) and scale the service layer horizontally by adding more nodes, but then we figured that
- The current promise engine is written in Java + Spring as a traditional servlet-based application not offering great throughput. If we were to scale the same application to handle ~100M hits per min at the inventory cache, we would again need huge resources.
- As there is a lot of legacy code, there is a lot of effort required to upgrade the current setup to use a better framework, more or less it would be a complete rewrite.
- Promise engine being a tier-1 service offering the critical functionality of blocking the inventory, we don’t want to risk making big changes as we don’t have a fallback for this functionality
Due to the above reasons, we decided to create a new service only to serve the inventory reads, with a plan that we would eventually and incrementally migrate the entire promise engine’s functionality to this new service.
Choosing the right framework
As the throughout requirement was in the order of millions, we decided to explore an event-based non-blocking framework as these frameworks are proven to offer great throughput. We explored Vert.X and Spring WebFlux and benchmarked them against the traditional Spring + Netty-based setup.
To run a performance test, we used a setup that consisted of:
- One application node [8 cores, 32G RAM]
- Hazelcast cluster of 3 nodes [3 x 8 cores, 32G RAM]
The first run was for a batch size of one. What it means is that one API call will result in one Hazelcast read operation. The response payload size was 1 KB on average.
With the above RPM, the application node was running at full capacity in terms of CPU.
Based on the above numbers, we picked Vert.x and ran more tests with increased batch size. A batch size of N means that one API call will result in N Hazelcast operations.
We decided to go with Vert.X for two reasons
- It offered maximum throughput per core.
- It is already being used in Myntra so we have some prior experience.
After experimenting with multiple caching solutions and application frameworks to scale our inventory read layer, we finally proceeded to use Hazelcast with near cache support and Vert.X as this combination offered maximum performance per core for our use case. We learned that this design can be leveraged at multiple places to effectively reduce resource usage.
Map or Cache entries in Hazelcast are partitioned across the cluster members. Hazelcast clients do not have local data…
Coauthors: Neeraj Gangwar, Priyank Pande
Thanks to Ramalingam R and Manoj Kansal for their review and support.