Sequential Decisions: To be optimal or not to be optimal?

Mert Parcaoglu
Trendyol Tech
Published in
5 min readDec 20, 2023

Introduction

In various problem-solving scenarios across different domains, decisions are not isolated but interconnected, forming a sequence of actions. The outcome of each decision directly or indirectly influences subsequent actions, altering the course of actions to follow. Our objective is to navigate a series of decisions, ultimately reaching the global optimal solution. To achieve this, a strategy tailored to the problem and environmental conditions must be implemented.

The primary question in strategizing sequential decision making is when to proceed with the next decision. We can examine three basic options: responding when conditions are met, responding at specific time intervals, and responding when decision requests arrive.

In the real-world scenario at Trendyol, explored in this article, we will delve into the third option. However, let’s briefly explain all options. For instance, consider a scenario where there is a challenge in dispatching incoming jobs throughout the day in service-oriented organizations like hospitals or electricity distribution companies. In an environment where jobs accumulate, a rule can be established to dispatch jobs when the pool reaches a certain number, denoted as ’N’. Alternatively, in more dynamic scenarios, decisions might be time-driven, initiated after a specific time interval following the preceding decision. In the last strategy, a decision is prompted by an external user or a microservice, necessitating a response based on available data.

In the first two strategies, the conditions and durations are known, whereas in the last approach, uncertainty prevails, posing the most significant challenge.

In a sequential decision-making scenario where responses are triggered upon receiving a request, the response must be based on the most current information available at that moment. Typically, the best solution is provided if it is accessible. However, examining whether delivering the best solution for each request ultimately leads to the optimal solution in the daily average is crucial. This article delves into this query through the picklist model used in the warehouse operations at Trendyol.

Picking Operations

Trendyol, a leading e-commerce company in Turkey, manages the delivery processes for millions of daily users. The operational efficiency in processing orders, from procurement to delivery, is critical. Internally, we employ various optimization algorithms to ensure efficiency without causing resource shortages or bottlenecks amid high operational volumes.

Within each warehouse, worklists are created to allocate and cluster customer orders. These worklists are then used to generate efficient picklists based on the capacities of picking vehicles. The process involves each picker sending a “generate a picklist” request from their handheld terminals to the system. This step is introducing the uncertainties we have mentioned earlier. The philosophy we follow in the optimization team is to write efficient algorithms that respond in a short time with limited resources.

In the current setup, the system acts as a microservice, responding to every request within a tight one-second timeframe. Although time is not a limit for our algorithm, as a Data Science team, we constantly seek iterative improvements and opportunities for advancement. This pursuit led us to question how to move beyond a myopic design.

Answer: Shadow Picklists

Our algorithm works in such a way that each request generates one picklist from the worklist backlog pool. The goal of the algorithm is to find a picklist that collects the most items with the least effort.

Pulling the best picklist from the pool each time leads us to necessarily give bad results in successive requests. The first system that comes to mind as a solution to this is to create a maximum number or a certain number of multi-picklists that can be created in the pool. However, this may cause us to miss the new orders coming into the system and therefore the opportunities brought by the job lists. In other words, there is the problem of system dynamism on one side and myopic solution generation on the other.

To counter this, we devised an approach — creating multiple picklists and returning only one, the most efficient among them, while considering the rest as shadow picklists.

The algorithm solves the multi-picklists problem including shadow picklists, but only one picklist is presented as a result. The challenge lies in finding a balance: creating multiple picklists without exhausting the pool for new orders, thereby addressing both system dynamism and myopic solution generation. The algorithm generates a specific number of picklists but returns only one that optimally collects products with minimal effort. The non-returned picklists, termed shadow picklists, contribute to a forward-thinking, opportunistic strategy. Also, since this approach does not change anything in input/output structure, implementation is very easy.

Initially determining the number of shadow picklists impacts solution time; a higher number increases time while a lower number converges toward myopic solutions. This delicate balance is crucial in achieving optimal results.

As the goal of this approach, we aim to maximize the average of our metric, which we evaluate the efficiency of the picklist, for all picklists.

An Illustrative Example

An illustrative example

To illustrate the concept, consider a scenario with both as-is and shadow picklist approaches. Initially the worklist backlog pool consists of eight worklists. Upon receiving two picklist requests, two additional worklists are added to pool. The picklists consist of three worklists. The as-is method returns the picklist-1 with the highest metric value of 3.5 for the first picklist request, leaving a depleted pool for subsequent requests. In the second request, it returned the picklist-2 with the best metric value among the remaining ones. Lastly, the picklist-3 has been created with the new worklists arriving to the system.

For this demonstration, let’s set the shadow picklist number to one. Consequently, upon receiving incoming picklist requests, the model generates solutions for two picklists but only returns the one with the highest metric value.

Consequently, assessing the effect of the shadow picklist approach through average values showcases its potential impact. While we are still in the testing phase and have not concluded trials within operations, even within this limited example, the method demonstrates potential in averting myopic decision-making.

Conclusion

To conclude, in highly dynamic decision-making scenarios, where choices affect subsequent actions, taking a step back to reassess problems from a broader perspective becomes essential. Despite striving for optimal solutions, our immediate decisions may deviate from the global optimum. Given the impossibility of integrating all processes due to time and resource constraints, employing strategies like the one discussed here brings us closer to the global optimum.

If we arrange Shakespeare’s revered quote within an operations research perspective: “To be optimal, or not to be optimal, that is the question.”

Join Us

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

--

--