Delivering Happiness: Finding the Sweet Spot Between Customer Delight and Cost Efficiency in Online Shopping

Yahya Ertuğrul Geçkil
Trendyol Tech
Published in
8 min readFeb 19, 2024

Introduction

Being able to do our restaurant and grocery shopping online has been the most permanent habit we have acquired during the pandemic. Many of us tried these applications, which have been in our service for decades, for the first time during the pandemic. The growing demand has helped many new companies get funding and cover the costs they faced during the start-up phase. At this point, customers accustomed to comfort can choose among many companies that compete constructively to provide the best service. One of the most important criteria affecting customers’ preferences is delivery time. The faster the delivery, the higher the customer satisfaction. In this article, we will examine the points where we can intervene to maintain the balance between customer satisfaction and operational costs with some sample actions and try to facilitate the final decision. Examples of low cognitive load solutions to increase delivery speed include increasing the number of couriers and narrowing the area where restaurants/markets sell, but the former increases operational costs per delivery, while the latter negatively impacts customer experience and reduces revenues. It is possible to find a more favorable point between customer satisfaction and costs by taking actions such as the simple examples we have given, at the risk of increasing operational costs or opportunity costs, but such examples are beyond the scope of this article. So you can consider the number of couriers and restaurant/market diversity as fixed.

Defining Metrics

To begin with, it will make our job easier if we simplify the concepts of customer satisfaction and operational costs with metrics that are as inclusive as possible and then follow the issue through these metrics. We assume that our decisions do not affect product quality and customer satisfaction, except for the delivery time and its side effects such as cold soup or melted ice cream. This is because trying to incorporate issues such as the impact of courier satisfaction on customer satisfaction would make the decision more complex than it needs to be. Instead, the processes related to courier satisfaction can be treated as an independent issue from customer satisfaction, which is beyond the scope of this article. Under our hypothesis, it seems reasonable to measure customer satisfaction directly in terms of delivery time. We can define delivery time as the time from the moment the order is placed to the moment it is delivered to the customer.

Assuming that the number of couriers is fixed, we can define operational cost in terms of the effective use of their time. In other words, we can increase efficiency by minimizing the time when couriers are not available to take a new order, but at the same time are not on their way to the restaurant to pick up a delivery or on their way to the customer to deliver it. This increase in efficiency can also have the effect of reducing delivery times, as if we had increased the number of couriers, even though we had not. Because if there were more couriers waiting for orders, we would be more likely to assign orders to couriers who are closer. This could lead to a reduction in delivery times.

So what is this dead time window we are talking about? To answer this, we can discuss when an order would be assigned to a courier in an ideal world. If we could accurately predict the time it takes for the restaurant/market to prepare the order, the time it takes to find a suitable courier for the order, and the time it takes for the assigned courier to arrive at the restaurant/market, we could ensure that the time the courier arrives at the restaurant/market coincides with the time the order is ready. In reality, these predictions are not accurate. For this reason, couriers may go and wait before the order is ready or go some time after it is ready, which is reflected in the delivery time. Delivery time was a metric we had already decided on, but we can add the waiting time of the courier at the restaurant as a second metric.

Figure 1: Matching start time estimation steps. We would like to extend our gratitude to Muhammed Enes Oral for the illustration.

Now that we have defined our metrics, we can progress the testing process of an iteration we made in our meal prep time prediction model as Trendyol local commerce. The improvement that will be subject to the test is a small change in the definition of the meal prep time, which is the time that our model tries to predict. Of course we had to retrain our model after this change, but how can we evaluate whether the results are better or worse?

Performance Evaluation

Normally we check that you are making better predictions through generic metrics like mean absolute error, mean squared error. But while a better mean squared error may lead to a better average wait time at pickup, the average delivery time may be worse than before and vice versa. It is not possible to say for sure whether this will be the case or not until we test it in simulation or even live in a small region. The main reason why simulation and A/B testing is critical for the test we want to do is that we have directly changed the target. This means that the previous scores like mean squared error that evaluated the success of the model will not be comparable. In other words, even though the mean squared error score of our old model is 26.74 and the score of our new model is 30.88, it is not possible to make a choice between the two by looking at these scores.

Before the simulation, we started by analyzing how delivery time and wait time at pickup would change, assuming that the decisions we make for deliveries do not affect each other. However, in reality, assigning a courier 1 minute earlier to one order will affect many things, including the distance of the courier to the restaurant/grocery store. So increasing the wait time at pickup by one minute on one order may also increase the delivery time on another order by one minute, and we cannot capture this in single observations. Nevertheless, doing this analysis saved us the trouble of testing all alternatives in simulation, or worse, live. As a matter of fact, using this analysis, we tested 6 different target definitions that testing even in simulation would have wasted our time.

Improvement in both metrics may not be necessary to implement the new model. You may observe an improvement in a third metric that is aligned with business objectives that you can derive from these two metrics. It may even be possible to train your model on such a metric. After all, customer satisfaction and effective use of your resources may not be of the same importance to you. Another alternative may be to use the one that is closest to the business objectives among different model trials. However, you should keep in mind that business goals are more cyclical and may not guide you in the long run. Based on a combination of all these alternatives, we made our choice among 6 different target definitions. We evaluated our alternatives based on the decrease in wait time at pickup per unit delivery time increase compared to the base model. Our observations revealed that the marginal decrease in wait time at pickup was higher with minimal changes. In order to avoid choosing these alternatives, we set thresholds for wait time at pickup and delivery time metrics in order not to deviate more than a certain amount from the business targets. Thus, we decided which target definition we would continue with before starting the simulations. We found that the version we chose, version-5, reduced wait time at pickup by two units for every one unit increase in delivery time, which was 20% ahead of the closest alternative.

It is also worth mentioning the difficulties of simulating such a problem. There are difficult points to include in the simulation such as the effects of bad weather conditions, changes in results with different numbers of couriers, the behavior of couriers while waiting for orders to be assigned to them. If we go further, these details also make it difficult for us to reach meaningful results from live tests. Because the decision we make here directly affects the assignment decisions using a common courier pool. In other words, it would not be possible to conduct a store-based or courier-based a/b test. We need to compare regions that share different courier pools diff in diff or find similar days for the same region and compare those days. The flaws of such a test may be too many to be plastered with mud, but it is obvious that you need to stop the test at some point in order to make operational decisions.

Figure 2: Comparing the distribution of our two key metrics between version 5 and the old model.

We determined the order and courier pool and ran our simulation from start to finish with our old model and the new model separately many times and logged the statistics. Since the results were in line with the results of the study we did before the simulation, we moved on to the A/B testing phase and started to test these new predictions in fifty restaurants from different regions. In order to evaluate the results of our live tests on the pre and post-test days of the same restaurants, we determined a ruleset that detects similar days. We analyzed whether the change on similar days was significant with a t-test and found the results to be significant. We also performed a mann whitney u test on the average differences of each restaurant with their own history and again found the results to be significant. To give an average value, we observed a 13.9% decrease in wait time at pickup and a 4.7% increase in delivery time.

Conclusion

Optimizing delivery time and balancing it with operational costs is critical to the success of online shopping platforms. With online shopping habits becoming widespread during the pandemic, customers’ expectations have changed and their demand for fast delivery has increased. However, interventions to meet these demands can have a negative impact on operational costs. Therefore, it is important to find the right balance and carefully monitor the impact of interventions.

The analysis and testing process plays a critical role in assessing the complexity and potential impact of interventions to optimize delivery time. Using the right metrics and aligning interventions with operational objectives is the foundation for a successful strategy.

In conclusion, analysis and testing to ensure the balance between customer satisfaction and operational efficiency is vital for the sustainable growth and success of online shopping platforms. With the right strategy and interventions, customer satisfaction can be increased and operational costs can be optimized, which can increase companies’ competitiveness and solidify customer loyalty.

Join Us

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

--

--