How we calculated 70 million prices in 2 minutes for Global Platforms — Part 2

Published in

Trendyol Tech

5 min readJul 24, 2023

In the previous article, we explored the challenges faced while implementing a price recalculation system capable of handling up to 70 million prices within a short timeframe. We discussed two solutions: pagination with tasks and sequential search-after with ElasticSearch. The first approach showed promise but suffered from ElasticSearch’s limitations, while the second one was not efficient enough for large-scale price recalculation due to its sequential nature. Now, in Part 2, we will introduce our new solution that allows us to return to a concurrent approach, significantly reducing the price recalculation time for millions of records.

The Problem with ElasticSearch and Pagination

In our initial attempt, we employed pagination to split the price calculation into tasks that could be processed concurrently. However, the vast amount of data in ElasticSearch presented significant performance issues. ElasticSearch’s max-result-window limitations forced us to search in smaller chunks, resulting in slower calculations when dealing with millions of documents.

The Virtual Partitioning Approach (The Virtual Bucket Approach)

In the Virtual Partitioning Approach, our primary goal was to narrow down the search results in ElasticSearch efficiently. We required an effective way to filter the data during searches in ElasticSearch. To achieve this, we opted for a partitioning strategy. While there were other partitioning options available, such as applying the partitioning logic based on existing fields we decided to use virtual buckets.

Couchbase has a built-in partitioning strategy that divides the dataset into virtual buckets, efficiently distributing the data across nodes. By default, Couchbase utilizes 1,024 virtual buckets, and this number can be adjusted as needed.

This approach builds upon the concept of concurrent processing introduced in Part 1 of this article series. However, it introduces a key improvement by efficiently handling data partitioning to enhance search efficiency. Instead of relying on traditional pagination, we adopted the concept of virtual buckets. These virtual buckets enable us to divide the data more effectively, allowing us to narrow down the search range in ElasticSearch queries using vBucket ID as a filter parameter.

To implement this approach, we utilized a metadata collection in Couchbase to manage virtual bucket IDs and seamlessly integrate them into ElasticSearch through CBES (Couchbase ElasticSearch).

Improving the Task Creation Process

When a new business price setting is added, the Price Setting Listener sends a request to ElasticSearch to obtain the count of price documents present in each virtual bucket. With this information, we can determine the number of tasks required for the price calculation process based on the desired page count. For example, we receive a response from the Price Read API that provides the vBucket IDs along with the count of prices that each vBucket contains, based on specific criteria such as channel, brand, or other parameters relevant to the price calculation:

{
 "0": 522,
 "1": 532,
 "2": 500,
   …
}

Similar to our first approach, we create a calculation job that includes the virtual bucket IDs and store it:

{
 "id": "0050833c-c9c4–4e97-a242–42981158118f",
 "type": "PriceSetting",
 "externalId": 431,
 "taskStatuses": {
 "0050833c-c9c4–4e97-a242–42981158118f-0–1": true,
 "0050833c-c9c4–4e97-a242–42981158118f-1–1": false,
   …
 }
}

Again, we generate an event for each task, this time including the virtual bucket ID besides the relevant parameters. This allows us to process each virtual bucket page independently and concurrently, just like we did in the initial approach.

The Concurrent Calculation Process

The Price Setting Listener consumes task events and begins processing each of them concurrently. By including the virtual bucket ID and other parameters as filters in ElasticSearch while fetching prices from Price Read API, we can efficiently narrow down the search range and significantly improve the speed of the price calculation process.

This concurrent calculation approach ensures that each task operates independently, minimizing the chance of errors and allowing us to retry tasks easily if needed.

Benefits of the Virtual Partitioning Approach

The Virtual Partitioning Approach provides several benefits that improve the overall efficiency and performance of our price calculation system:

Better Performance: By dividing the data into virtual buckets and processing them concurrently, we can achieve faster price calculations even for millions of price records.
Error Tolerance: Each task is independent, allowing for easy retries in case of errors, which reduces the impact of failures on the entire system.
Scalability: With the Virtual Partitioning Approach, our system is now highly scalable. Since we can do parallel and concurrent calculations, we can easily scale our system to handle increased data and meet the growing demand for price calculations while maintaining high performance levels.

Outcomes

The Virtual Partitioning Approach has revolutionized our price recalculation system, allowing us to complete the recalculation of 70 million prices in just 2 minutes. By leveraging virtual buckets and concurrent processing, we overcame previous limitations and achieved remarkable efficiency.

Furthermore, with the capability to calculate one channel in 1–2 minutes, each consisting of around 2 million prices, and the parallel calculation taking around 10–15 minutes, we have the flexibility to easily scale our system and improve it even further. Businesses can now initiate calculations on-demand or schedule campaigns at their convenience, providing greater control and agility in price management.

With the adoption of the Virtual Partitioning Approach, we have achieved a significant improvement in our pricing system. The Virtual Partitioning Approach has accelerated the decrease in Kafka lags, resulting in faster price recalculations and real-time updates for our customers.

Additionally, the new approach has efficiently reduced ElasticSearch’s CPU usage, enhancing the system’s stability. Moreover, the Virtual Partitioning Approach’s inherent scalability allows us to meet the growing demands of the business without limitations on ElasticSearch. These combined benefits make our system highly efficient and reliable for continued success as we provide seamless and real-time pricing experiences to our customers.