Cost Orchestration at Walmart
In order to have an eCommerce store especially at the scale of Walmart you need to keep track of a lot of data. Every item that Walmart sells has a huge amount of information associated with it. Some examples of data that Walmart has on each item are the item’s price, the warehouse the item is stored at, whether the item needs to refrigerated or not. As Walmart has over 150 million different items in its online store, the logistics of managing all these items becomes a distributed data problem that thousands of software engineers work on. In this article we will talk about Walmart’s system to organize cost. Cost is how much Walmart has to pay to buy each item from distributors. The price of the item is the cost of that item, plus shipping, plus Walmart’s profit per item, as noted in the equation below.
C represents the cost, L represents the logistics cost, S represents the shipping cost, p represents Walmart’s profit per item, and P represents the price Walmart presents to customers to purchase each item online.
As described in the equation there are 5 micro-systems that all use cost data: pricing, merchants, shippers, logistics, and cost. Each of these micro-systems has a software team and many micro-services to manage Walmart’s data. Pricing uses cost data to update the price of an item. Suppliers use cost to better represent supply chain dynamics. Logistics uses cost for logistical transport information, and Merchants are the originator of cost itself as they sell the items to Walmart. The equation above is simplified there are many more pieces of data that go into calculating price and each microsystem keeps track of much more data than the equation above.
The cost of each item is always changing. An example reason the cost of an item changes is the merchant(where the item was purchased from) wants to change their price because the ingredients to create the item have changed. It is critical that each of these different teams mentioned above has the same information. Each piece of data related to cost is in sync. If one of the teams has a different cost than another team, the price of the item may be unfairly skewed and Walmart could lose a significant quantity of profit as Walmart works on a gigantic scale selling thousands if not millions of the same item.
There are three pieces of software keeping the data in sync: one-piece that retrieves the data, one piece that organizes the data, and one-piece that distributes the data. Retrieving the data consists of providing a user interface where people or other software programs can write the data to Walmart. Organizing the data consists of transforming the user input into data that is recognizable to Walmart internal systems. Distributing consists of delivering each piece of data to each microsystem that needs it.
The orchestration piece happens through Kafka or an HTTP call. Kafka is a messaging service where one team writes a message, and at a later time, another team responds indicating that they received the message. Each time someone responds it is correlated with a particular key. And, as each message is a distinct set of information, the responder does not have to be present to hear the response. Think of Kafka like text messaging, whereas HTTP calls are like verbal phone calls.
Below are examples of redacted snippets of the messages that are transmitted from software team to software team, and the calculations some software teams do before passing on data to the next consumer. Each delivery and message are coordinated by the orchestration team and its software.
Piece 1
Merchants write most of their information in excel documents in a grid like the following.
Gtin is a numerical code that represents a particular item.
ShipnodeID is a numerical code that represents a particular warehouse or distribution unit.
Piece 2
The data is broken into snippets for each team that needs the data. For example, the Logistics team gets the following snippet of data derived from the grid above.
Now each team has a clear picture and all the data it needs. The system can inform its individual consumers of proper cost information and the price of each item can be determined.
Walmart receives roughly 40 million cost updates per day and recalculates the prices of 80 million items per day. In total, the data transformation system and orchestrator handle the delivery of 300 million attribute updates per day.
Also, realize that this a team effort made up of several engineers, on the hyperloop team in Walmart. I would especially like to acknowledge my manager Chintan Shah, my fellow team member Pavankumar Pasala and corresponding Cost Resource Tier doubles Eddy Wang, and Dzmitry Verkovin.