Performance Enhancements for the Item Setup Orchestrator

Jimmy Chen
Walmart Global Tech Blog
7 min readJun 22, 2020

Recently, the Item Setup Team realized its systems were not performing optimally. Over the last year, Walmart increased sales and item growth by 40%, but the number of machines and capacity did not grow by the same amount. So, the item setup team had to optimize its systems and improve performance.

The performance improvements will be explained in two pieces. First, a brief overview of Walmart’s item setup backend, and second, the actual performance changes.

Walmart’s backend is where the data that builds the frontend is created. For example, Walmart has a database that contains the images you see online. When you click the page on Walmart’s website to buy Colgate toothpaste, the Item Read Orchestrator pulls images from Walmart’s Product Database and pricing from Walmart’s Pricing Database.

This article will focus on how each of these front end databases gets its data, or in other words, how merchants and sellers get data into Walmart’s databases, using a system called Hyperloop.

Hyperloop consists of three systems: Spec Parser, Feed Ingestor, and the Item Setup Orchestrator. The first step of selling an item online is storing all an item’s data. For example, an item is made up of seller information, offer information, product description information, pricing information, logistics information, and shipping information. Seller Information is made up of a series of attributes, for example, the name of the seller, seller description, seller id, and item specific information like its identifier etc. Sometimes storing data is more complicated because certain pieces of data depend on each other. For example, you can’t change the seller name without changing the seller id.

The next system is the data ingestor. The data ingestor is a state and transformation manager. First, it keeps track of the process of changing an item’s attributes. This is especially important because the number of systems and grouping of information that exists across systems.For an example Logistics information is dependent on pricing and seller information. If seller information is changed, logistics and other downstream systems need to be looped through and updated as well. Additionally, if one system fails, the data ingestor needs to retry and reprocess the entire system.

Another vital function the data ingestor performs is data transformations. Data from merchants is transformed from excel documents into Walmart data objects.

The final system is the item setup orchestrator. The item system orchestrator transports the data from the data ingestor to each of the systems responsible for the data. The item setup orchestrator takes price data from the data ingestor to the price system itself. In order to facilitate these final transactions, the item setup orchestrator is full of REST APIs and kafka tunnels. The orchestrator is especially useful when you want to chain messages and send information to multiple resource tiers at the same time. For example, cost information is sent to three systems the Supplier System, the Logistics System, and the Cost System.

Now that you understand this ecosystem, we can begin to talk about the performance improvements that were made during the holidays. The item setup orchestrator was optimized because it takes the longest, touches the most systems, and has the most room to optimize.

Performance Enhancements

ISO was optimized in three ways. First, we split and dedicated machines for certain jobs. Certain jobs that ISO performs have higher priorities and shorter SLAs (service level agreements). A service level agreement is the length of time a system has to finish processing a task. Thus using these different priorities, we can dedicate machines and resources. The second optimization was adding an internal and external distributed cache to minimize external reads. The third optimization was careful memory management using yourkit.

ISO’s most important functions are setting up items and setting up the prices for items. These jobs are extraordinarily critical and require short SLAs. Prices should be reflected online as soon as merchants update them, likewise ISO has two hours to update every system with the merchant data it received. ISO dedicated 60 machines to process all the item updates it receives. Additionally, as soon as the resources are available to execute another item setup, the machine will start processing the next item setup. These item setup updates will be prioritized higher than the other jobs on those machines. ISO executes roughly 20 million item setups every day and has a peak volume of roughly 40 million item setups.

Price updates were prioritized a little differently as they are updated directly through rest updates. Items setups occur through a Kafka topic. A good visualization of Kafka is an assembly line. The Kafka topic is a long line of products at the same stage. The consumer, ISO, picks the speed it takes items from the assembly line.

A rest call is the opposite in that rather than ISO picking the speed it progresses, the caller of ISO determines the speed for which the item is updated. ISO handles rest calls in two ways using a load balancer. First, a load balancer distributes the rest calls from users to 60 machines. The load balancer sends messages in a round robin style, which means each machine gets the same amount of traffic; the first message goes to the first machine, the second message goes to the second machine and the messages loop. The 61st message goes back to the first machine. After the messages get past the load balancer, ISO rate limits and if the traffic and speed of messages are too fast for the downstream services, ISO will throttle and return to the rest call that the service is unavailable. The price ingestor will retry the messages that the service is available. ISO receives roughly 60 million price updates per day and has a peak volume of roughly 80 million messages.

To understand the second optimization, introducing a local and external distributed cache. Caches are especially critical in orchestration because not all the operations in ISO are writes. A significant amount of the operations are reads. A write operation is where the data in the system is changed. A read operation is where the data in the system is not changed. In read operations, rather than making a call to an external system, which takes time and may fail, you can read from a local or external cache. A cache is memory optimization, where previous copies of the data are stored. For example, in Walmart, ISO looks up the price of a water bottle from the pricing resource tier and stores the price in the local cache. The next time ISO looks up the price of the same water bottle, ISO can retrieve the price from the local cache rather than having to look up the data from the pricing resource tier again. Reading from the local cache is more than 20 times faster than reading from the pricing resource tier. The drawback from the local cache is that it has limited capacity, and the data must be looked up prior to the search. The local cache only has the capacity to store roughly 20 minutes of data. The limited capacity is partially solved by the external distributed cache. When writing to the local cache, all the machines also asynchronously write a larger external database. The external database is shared by all the machines in the system. When looking for data in a cache, not only are you getting a single machine’s history, you are getting all the machines’ historical data. Between the two, we were able to reduce the number of read calls by 25%, speeding up the entire item setup process by more than 5%. Our hit rate of all the caches combined is close to 15%.

Let’s move on to the third optimization profiling and looking for bottlenecks. Yourkit is a java profiler that looks for memory leaks, bottlenecks, and other performance optimizations. It is especially useful because you can examine each thread in your application, and see what java classes are taking the longest and using the most memory. When profiling our application, we found two bottlenecks: our logging and our JSON serialization. Because so many messages stream through our system, we are constantly converting the messages from strings to java objects and writing those messages into a log to debug and track what’s happening. We worked to optimize both of those pieces. The first version of log4j has many known deadlocks, and the next generation of log4j, log4j2 is 10 times faster than the old version. ISO now is upgraded to the next version of log4j.

The second bottleneck was changing the json serializer. The item setup team chose the JSON serializer JsonIter. JsonIter is significantly faster than the previous JSON serializer and especially optimized to be multithreaded which matches the needs of our application. Json Iterator Benchmark

The sum total of these three optimizations increased the performance of the system by 50%. This increase in performance was critical for the winter holidays and an increase in demand caused by COVID. Walmart was able to save money and live better and did not have to buy more machines. Over the last holidays, the system handled over 300 million transactions per day flawlessly.

--

--

Jimmy Chen
Walmart Global Tech Blog

Software Engineer, World Wanderer, Life Learner. I am trying to educate the world and myself. What can I offer the world? What does the world offer to me?