Scaling the Walmart Inventory Reservations API for Peak Traffic

Shanawaaz Mohammed
Walmart Global Tech Blog
4 min readMay 3, 2022

--

Photo by Jake Givens on Unsplash

When a customer places an order on the Walmart.com website or mobile app, an inventory reservation call is made. This captures the demand for the items that are in the customer’s cart. During the Thanksgiving holidays or any sales events (e.g., PS5 or Xbox events), the number of requests for inventory reservations can be significantly high. In this post, I would like to explain how we overcame the scaling issues and are now able to seamlessly handle the peak traffic.

Before going into optimizations, let us understand how the inventory sellable is calculated.

Available To Sell = OnHand Inventory — Orders In Progress

The above formula in simple terms calculates the units available to sell. OnHand inventory is updated based on the inventory ingestion feeds. Inventory feeds can come from Walmart stores, distribution centers, or the Marketplace interface. Demand\Orders In Progress is updated based on the number of inventory reservations made.

{"id": "5a3adr7sehfe4y9a8ezb80da8sch6afz","sId": "65CED97DC02248478EAF8242FC808601-ss-1",//partition key"onHand" : 10,"ordersInProgress" : 4,"availableToSell" : 6...other fields omitted}

The first design of the inventory reservation service was simple. It was a horizontally scaled API, where each instance tried to execute the reserve command (execute command and apply event concepts are part of the event sourcing pattern) and write the executed reserved event to the events collection (CosmosDB container). Once the reserved event is written, then a snapshot service will apply the reserved event on the inventory snapshot. That means the quantities from the reserved event are aggregated to the ordersInProgress field on the inventory snapshot (see JSON above).

Reservation API without sticky session and header-based routing

With this approach, we quickly noticed that when there is an increase in the number of API calls, there is a lot of write contention. That means if there is more than one instance trying to write the reserved event on the same partition key, they will be retrying with conflicts.

To overcome this write contention, we utilized the scatter-gather technique in the new design. The inventory reservation API is split into a Main Reservation API and Line Reservation API. When the Main reservation API receives a reservation request, it will split the reservation lines grouped by the sellingId which is the database partition key and call the line reservation API. Each line item in Main Reservation API is routed to the Line Reservation API with a custom HTTP header. This HTTP header is identified by Istio sidecar to route requests to the same Kubernetes pod using consistent hash-based Destination Rule.

Main and Line Reservation API with sticky session using batch-id header

By doing this design change, we are making sure only one instance is receiving the requests for the same partition key. And then within the same pod, the incoming request is routed to the same thread that runs a MailBoxProcessor. This technique is used to achieve in-memory concurrency. Using this request, routing the number of processes\threads trying to write to a single partition in the distributed data store (CosmosDB) is reduced to one.

Mailbox processing using actor pattern

Although we can reduce the write contention with this design, the next bottleneck we encountered was the number of reserved events that were getting written to the database from a single instance. CosmosDB has 10,000 RUs per partition throughput as the upper limit. That means, if each reserved event document write is going to use 5 RUs, then our write throughput is going to be capped at 10,000/5= 2,000 orders per second (OPS). That means for any item we can only take 2,000 OPS or 120,000 orders per minute (OPM).

To overcome this per partition write limit, another optimization we created was to batch the reserved events and create a single batchReserved event that can write contain multiple reservations as a single document. That means we can increase our write throughput to more than 2,000 OPS. We have stress tested it to handle more than 5,000 OPS or more than 300,000 OPM.

Another optimization that I want to highlight is snapshot caching. Before a Reserve command is executed, the current state or the snapshot of the partition key must be read from the database. Since the snapshot read was happening for every reservation call and only one instance is responsible for processing the reservations for a single partition key, an in-memory snapshot cache with a TTL is maintained within that instance. And there is an async background process that keeps this snapshot cache in sync with the database. This optimization has reduced the total number of snapshot reads and brought down the read RUs cost on the reservation service.

To summarize, below is the list of all the optimizations that can be applied to any write-heavy API.

  1. Scatter-Gather the API requests with a sticky session so a database partition is always processed by the same instance.
  2. In-Memory concurrency using actor pattern with mailbox to restrict the processing of a single partition to a single thread. This also helps with batch processing of the same partition requests.
  3. In-Memory snapshot state caching to reduce the number of reads.

These optimizations were possible because of a combined team effort from our awesome team: Shanawaaz Mohammed, Devesh Singh, Ammad Shaikh, Shanil Sharma, Mohammad Tariq, and Murali Gadde.

--

--

Shanawaaz Mohammed
Walmart Global Tech Blog

Software Architect with experience in creating highly scalable and optimized services