Optimizing the inbound process with a machine learning model

How we predict the adequate number of delivery trucks needed for vendors to send their products to Coupang’s fulfillment centers

Coupang Engineering

Published in

Coupang Engineering Blog

7 min readMar 9, 2023

By Austin Yang & JY Cho

This post is also available in Korean.

Coupang continuously strives to optimize the inbound logistics process. By minimizing the resources wasted while receiving products at fulfillment centers, we can sell products in a more timely fashion and deliver them to more customers faster. To this end, Coupang has been efficiently improving the process of receiving products at fulfillment center that Coupang directly purchases from vendors through machine learning. Let’s look at what improvements we have made so far in this post.

· Background and challenges
· Training a model to predict the number of trucks
∘ Feature extraction
∘ Model learning: LightGBM algorithm
∘ Model hyper parameter search: Bayesian optimization
∘ Inbound reservation system integrated with the model
∘ Trade-off between underprediction and overprediction
· Result of applying the model
· Future plan

Background and challenges

Every day, thousands of vendors all over the place load different types of products onto truck to send them to Coupang’s fulfillment centers. Each fulfillment center has docks where trucks are parked and goods are unloaded. The number of docks at each center that can be used per hour is fixed. To unload goods, one truck uses one dock for a certain period of time, and this is called a slot.

The number of required slots for each inbound must be precisely predicted so that the products from various vendors can be efficiently unloaded to the set number of slots. If the predicted number of slots is smaller than the actual number of required slots, it could cause a delay in the inbound process. On the other hand, if the predicted number of slots turns out to be bigger than necessary, it would end up wasting our limited resource.

Potential resource waste in the supply process from vendors to Coupang fulfillment centers — Figure 1. Potential resource waste in the supply process

To address this issue, we have worked on developing a system that predicts the appropriate number of slots based on the characteristics of goods to be supplied and the characteristics of vendors show when reserving slots. The system aimed to reduce the number of wasted slots and prevent IB delay resulting from a lack of slots. In the next section, we will explain in more detail what techniques were used to achieve this goal.

Training a model to predict the number of trucks

We decided to solve the problem by using data and:

1. Extracted features that had an impact on the number of trucks based on the logistics data and inbound requests accumulated over the years, and then prepared training data by incorporating them with the data on the number of trucks that were actually used for inbound.

2. Trained a machine learning model to predict an adequate number of trucks that should arrive at a dock.

3. Integrated the trained model with the reservation system, and made the adequate number of trucks displayed on the system right away when vendors make a request for inbound.

By adding this new automated prediction to the inbound reservation system, we have achieved our project’s goal which is the efficiency in our slot operations.

Feature extraction

We went through an exploratory data analysis (EDA) process to find the right features for the model which predicts the number of trucks to be used for inbound, utilizing the massive logistics data accumulated at Coupang. However, we soon learned that we needed knowledge from the domain experts to read between the lines among those data. Through a series of interviews with the Coupang logistics managers, we found certain patterns from the inbound process. Based on the findings, we came to discover multiple useful features for predicting the number of trucks. After processing those discovered features via feature engineering, we were able to define the final feature set.

Model learning: LightGBM algorithm

We extracted about 800,000 training data sets from the inbound request data collected for over two years. Since the size of the data sets wasn’t small, we looked for an algorithm which could be trained fast and tuned. In addition, many of the identified features were categorical features. We decided to use the LightGBM algorithm, which has a high predictive accuracy for data sets which have such features.

LightGBM is a tree-based boosting model as well as an algorithm that has demonstrated effective performance in many machine learning problems. Because LightGBM applies leaf-wise tree growth where trees are grown vertically while other tree-based algorithms apply level-wise tree growth where trees are grown horizontally, LightGBM enables fast training. In level-wise tree growth, we have to wait until each tree is fully grown, but the leaf-wise growth approach grows the tree vertically by splitting the data at the leaf nodes with the highest loss change. Because of this, LightGBM is faster in learning than other algorithms.

In addition, unlike most of the other algorithms, LightGBM does not require separate one-hot encoding for categorical features, because it applies Fisher algorithm to find the optimal split of data classes. Thanks to this, LightGBM generally shows a high predictive accuracy and a higher learning rate. When compared with other major tree-based algorithms based on the validation data set, LightGBM showed the learning rate and predictive performance we desired.

Major tree-based algorithms’ predictive performance on the validation data set of the inbound requests to Coupang’s fulfillment centers — Figure 2. Major tree-based algorithms’ predictive performance on the validation set

Model hyper parameter search: Bayesian optimization

We configured a set of hyperparameters of this model to be automatically chosen using Bayesian Optimization. Bayesian Optimization is a method to find the global optimization x* out of x that maximizes the f(x) value of the objective function, and shows a high efficiency for acquiring the optimal solution when the objective function is not specified and getting the f(x) is computationally expensive. It is also one of the commonly used methods in machine learning when it comes to figuring out the optimal combinations of hyperparameters.

The Bayesian optimization process is as follows.

(1) Train a model selecting N values randomly within the set hyperparameter bandwidth, and calculate the function value from the model.

(2) Configure the group which consists of the sets of input values and function values in order to assume the objective function, utilizing a probabilistic method such as Gaussian Process.

(3) Based on the assumptions of the objective function so far, train the model to select the input value candidates that are expected for finding the optimal x*. Calculate the function from the trained model and then add the set of input values and function values to the group.

(4) Repeat the mentioned (2) and (3) for an assigned number of rounds, renewing the assumed function. Select the optimal hyperparameter x* which maximizes the value of the function.

We run this process on a monthly basis with new data, continuously updating the model.

Inbound reservation system integrated with the model

The model is deployed on SageMaker. When a vendor requests a slot reservation on the reservation system, the reservation system calls SageMaker endpoint, receives the result predicted by the model and informs the vendor of the appropriate number of trucks.

How the Coupang’s inbound reservation system and the ML model are integrated — Figure 3. How the IB reservation system and the model are integrated

Trade-off between underprediction and overprediction

Errors are inevitably present in the machine learning predictive model. As for the issue we’re trying to resolve, predicting fewer slots than the actual number of needed slots leads to underprediction, and predicting more slots leads to overprediction. There is a trade-off between underprediction and overprediction. Within the same range of error rates, if a predictive model tends to predict fewer trucks as the adequate number of trucks, overprediction decreases but underprediction increases. On the other hand, if a predictive model tends to predict more trucks as the adequate number of trucks, underprediction decreases but overprediction increases.

Basically, our goal is to minimize the number of slots that are unnecessarily reserved by reducing overprediction. But if we train a model to overly minimize overprediction, vendors might feel that too few slots are allocated to them resulting in unintended inconvenience.

After discussing with relevant departments, we agreed that it would be beneficial for both Coupang and vendors to reduce overprediction as much as possible while maintaining the ratio of underprediction at an appropriate level at the same time and came up with a final model reflecting this consensus.

Result of applying the model

The final model underpredicts the number of trucks by 2.53% and overpredicts it by 5.04%. This is a significant improvement from the underprediction rate of 8.71% and the overprediction rate of 44.45% we had when vendors had predicted the appropriate number of trucks by themselves and registered it on the reservation system.

As a result of using this learning model, the number of cases where a vendor changes the delivery date due to a lack of slots has decreased by 67.9%. Now they can supply their products to fulfillment centers according to a schedule they want. Coupang can reduce unnecessary expenses and receive as many products as needed on a desired date.

Result of applying a ML model which predicts the number of trucks that a vendor needs to supply their products to Coupang fulfillment centers — Table 1. Result of applying a model which predicts the number of trucks

Future plan

As Coupang is expanding its business, we are handing a wider variety of products and are also operating recently built fulfillment centers that only handle home appliances. Thus, we need more accurate prediction on the number of trucks that meets Coupang’s new needs. In order to make the model show an adequate performance in predicting the number of trucks for these new types of products in a new environment, we will continuously improve the model by identifying derived features and adding data.

If the innovation in fulfillment process interests you, come and check out our open positions.