A critical component at Walmart or any ecommerce store is data. One particular aspect of that data is the data related to the items that Walmart sells online. Every item has a huge amount of information associated with it, whether it be the price of the item or the logistics of which warehouse the item is stored. As Walmart has over 150 million different items in its online store, the logistics of managing all these items becomes a distributed data problem that thousands of software engineers work on. To narrow the scope of this problem we will focus on one particular attribute — cost. Cost is an important attribute because it’s how much Walmart has to pay to buy each item from distributors. The price of the item is the cost of that item, plus shipping, plus Walmart’s profit per item, as noted in the equation below.
C represents the cost, L represents the logistics cost, S represents the shipping cost, p represents Walmart’s profit per item, and P represents the price Walmart presents to customers to purchase each item online.
The pricing, supplier, logistics, and merchant software teams each use cost. Pricing uses cost data to update the price of an item. Suppliers uses cost to better represent supply chain dynamics. Logistics uses cost for logistical transport information, and Merchants are the originator of cost itself as they sell the items ,themselves, to Walmart.
As the cost of the item is always changing, whether because the merchants wanted to change their prices, or the ingredients to create the item changed, it is critical that each of these different teams has the same information. If one of the teams has a different cost than another team, the price of the item may be unfairly skewed and Walmart could lose a significant quantity of profit as Walmart works on a gigantic scale selling thousands if not millions of the same item.
There are three pieces of software keeping the data in sync:one piece retrieving the data, one piece organizing the data to be used by other software teams, and one piece distributing the data to the other software teams. Retrieving the data consists of providing a user interface where people or other software programs can write the data to Walmart. Data transformation consists of transforming the user input into data that is recognizable to Walmart internal systems. Orchestrating consists of the individual delivery of each piece of data.
The orchestration or delivery piece happens through Kafka or an http call. Kafka is a messaging service where one team writes a message, and at a later time another team responds indicating that they received the message. Each time someone responds it is correlated with a particular key. And, as each message is a distinct set of information, the responder does not have to be present to hear the response. Think of Kafka like text messaging, whereas http calls are like verbal phone calls.
Below are examples of redacted snippets of the messages that are transmitted from software team to software team, and the calculations some software teams do before passing on data to the next consumer. Each delivery and message are coordinated by the orchestration team and its software.
Merchants write most of their information in excel documents in a grid like the following.
Gtin is a numerical code that represents a particular item.
ShipnodeID is a numerical code that represents a particular warehouse or distribution unit.
The data is broken into snippets for each team that needs the data. For example, the Logistics team gets the following snippet of data derived from the grid above.
Now each team has a clear picture and all the data it needs. The system can inform its individual consumers of proper cost information and the price of each item can be determined.
Walmart receives roughly 40 million cost updates per day and recalculates the prices of 80 million items per day. In total the data transformation system and orchestrator handle the delivery of 300 million attribute updates per day.