Optimizing Promotion Search with Elasticsearch: Our Experience

Bilge Tekkursun
Trendyol Tech
Published in
8 min readApr 4, 2023

--

Magnifying glass hovering over a price tag (DALL-E)

In this blog post, we will discuss how we utilize Elasticsearch, the decisions we’ve made to cater to our needs, and the reasons behind these choices. Before diving into our use of Elasticsearch, let me first provide an overview of a promotion.

What is a promotion?

A promotion offers customers discounts or incentives to encourage them to purchase products. These promotions can take various forms, such as “Buy 3, Pay 2”, “20% discount on orders over 100 TL”, or “30% off the second product”, among others.

Promotions can include various conditions such as brand, category, minimum basket amount, seller and so on. If a content satisfies the conditions of a defined promotion, it can be said that the discount is applicable to that content.

Elasticsearch is an essential search solution that we use to find applicable promotions in the shopping cart.

Following topics will be covered:

  1. Why use -1 and “emptyList” as default values?
  2. Why we don’t use sharding?
  3. The purpose of having two different indices
  4. Deletion of expired promotions from indices
  5. Why we don’t use cross-datacenter replication between Elastic clusters?
  6. Mssql -Elastic Comparator

Why use -1 and “emptyList” as default values?

When storing promotion fields and conditions, not all fields may be filled. For example, a promotion assigned to a campaign will have a non-empty campaignId field, but there could be many empty fields such as sellerId, contentId, brandId etc. When we perform a search in Elasticsearch for promotions applicable to a shopping cart, we assign default values of -1 for numeric fields and “emptyList” for string fields if they are empty.

Below, there is a part of the document that shows how we store promotion conditions in Elasticsearch.

          "siBrandIds" : [
-1
],
"siCategoryIds" : [
-1
],
"siCampaignIds" : [
619933
],
"siSectionIds" : [
-1
],
"siSellerIds" : [
-1
],
"siListingIds" : [
"emptyList"
],
"siStoreIds" : [
-1
],
"siContentIds" : [
-1
],
"siApplicationIds" : [
-1
],
"siUserIds" : [
-1
],
"siUserSegments" : [
"emptyList"
],
"siEmailExtensions" : [
"emptyList"
]

Additionally, here’s an example of how we send queries to Elasticsearch:

 "query": {
"bool": {
"filter": [
{
"terms": {
"siSellerIds": [
-1,
123123
],
"boost": 1.0
}
}
{
"terms": {
"siContentIds": [
-1,
123123
],
"boost": 1.0
}
},
{
"terms": {
"siFilterableLabelIds": [
"emptyList"
],
"boost": 1.0
}
},
],
"adjust_pure_negative": true,
"boost": 1.0
}
}

How does this approach help us to optimize performance?

  • Reducing Elasticsearch’s processing time: When a value is not specified for a numeric field, Elasticsearch cannot determine what it should match with and has to check more resources as a result. By assigning a predefined value like -1 to empty fields, Elasticsearch can make decisions regarding the field’s value more easily, leading to faster queries. This reduces ambiguity and helps execute queries more quickly.
  • Improving query comprehensibility: Using predefined values makes query results more predictable, which makes the query more understandable and manageable. Predefined values help determine if a field is empty and how it should be handled in the query results. For example, using -1 for a numeric field indicates that the field is missing, making it clear whether the query should ignore the field or process it in a specific way.

Using predefined values can enhance the performance and predictability of Elasticsearch queries. To ensure success, it is crucial that these values do not overlap with natural values in your dataset. In our case as you see code below, we set initial values to our fields and no conditions or promotion fields set to-1 and we do not experience any issues with our chosen default string value, “emptyList”.

Setting initial values to fields that will be used in Elasticsearch query

Why don’t we use sharding?

We prefer not to use sharding in our Elasticsearch infrastructure. One of the main reasons for why we don’t need it is that we have two indices and strive to keep our data as optimized as possible.

There can be various reasons for not using sharding in Elasticsearch but for us reasons can be listed as below:

  • Maintaining query performance: Sharding involves splitting data across multiple shards. In situations with heavy querying needs, such as our promotion apply process during events, query performance may decline due to coordination between shards.
  • Low data volume: When dealing with a low volume of data on Elasticsearch, implementing sharding can be an unnecessary and costly endeavor. Instead, managing small data volumes with high performance and availability can be achieved through the use of a few replicas. In addition, we can state that after conducting load tests with different numbers of shards and examining Elasticsearch’s CPU usage and response time, the most suitable approach is to use a single shard for us.
  • Optimizing replica count: By optimizing the number of replicas, we can use Elasticsearch with high availability and performance. In this context, not using sharding prevents unnecessary complexity and costs.
  • Reducing system complexity: Sharding configuration and management can introduce additional complexity and workload for system administrators and developers. If data volume and usage requirements don’t necessitate sharding, as is the case for us, not using it will result in a simpler and more manageable structure.

Why do we have two different indices in Elasticsearch? What are their purposes?

We have two indices: all-promotions and active-promotions. The all-promotions index stores promotions whose end date has not exceeded one year, while the active-promotions index contains ongoing promotions and those that have not yet started.

The use of these two indices diverges in the following situations: the all-promotions index stores past promotions to handle refund processes or reporting requests for backdated discount apply processes. The active-promotions index is used to quickly respond to active and upcoming promotion apply requests.

The objectives of having two indices include:

Performance and efficiency: Each index has specific use cases and responsibilities. Therefore, they can respond to requests more efficiently and with better performance.

Improved scalability: Managing active-promotions and all-promotions indices separately allows each index to be scaled according to its specific requirements. For example, we can use more replicas for the active-promotions index to ensure high availability and performance while reducing costs by using a lower replica count for the all-promotions index.

Data management: By using two indices, we can manage the lifecycle of the data more effectively based on the promotion statuses. When promotions in the active index expire, we move them to the all-promotions index while also removing promotions with expired end date from the all-promotions index to maintain performance, as we will discuss in the next section.

Faster data access: Separating active-promotions and all-promotions indices allows us to provide optimized querying and data access processes for each index. This enables users to apply and refund promotions more quickly and efficiently. By decreasing the amount of data stored in the active index, we were able to achieve a notable decrease in response time, reducing it from 20ms to 10ms.

Deletion of expired promotions from indices

At a certain point, we delete promotions that will no longer be used from both indices. A script running every minute removes promotions whose end date has passed by one day from the active-promotions index and promotions whose end date has passed by one year and two days from the all-promotions index. The advantages this provides include:

Controlled data volume: Continuously deleting old promotions helps us keep the data volume in the indices under control, which in turn aids in maintaining performance.

Improved query performance: Limiting the size of the living promotion data set enhances query performance. Querying less data results in faster responses and a better user experience.

Data lifecycle management: This process is a crucial part of data lifecycle management. By regularly removing expired promotion data, we ensure that our users work only with valid and current promotions.

Optimized resource utilization: By regularly deleting old promotion data, we optimize resource usage, allowing Elasticsearch servers to utilize CPU, memory, and disk resources more effectively.

Below, you can find code snippet we use that takes alias names for environments and runs query to delete expired promotions.

Deleting inactive promotions from active index and deleting expired promotions for more than1 year from all-promotions index

Why we don’t use cross-datacenter replication?

Cross-datacenter replication is a technique used to replicate data between data centers to ensure data availability in case of a disaster or failure in one data center. However, in our case, we have opted not to use this approach due to our specific architecture.

We use an elastic search consumer and a couchbase elastic connector (CBES) deployed in two separate data centers, “Moon” and “Mars”. If the decision were made to use data replication between the elastic clusters, we would only need a single elastic consumer and a single CBES. While this may seem like an efficient solution, it would come with a significant drawback — the loss of one data center would result in a complete loss of availability.

By not using replication for our elastic clusters, we are able to provide fault tolerance in the possibility of data center failures. This is achieved through the deployment of separate elastic consumers and Couchbase elastic connectors in each data center, ensuring that if one data center goes down, the other can continue to function independently.

Mssql -Elastic Comparator

The process of synchronizing promotions from MSSQL to Elastic

We utilize MSSQL as our primary database and upon adding a promotion record to it, we dispatch a promotion created event to Kafka. The Promotion Elastic Consumer then receives the event and creates the corresponding promotion record in Couchbase. We use the Couchbase Elastic Connector (CBES) to stream the promotion data into Elastic. However, we may encounter issues in Couchbase Connector or Promotion Elastic Consumer that prevent the synchronization of the data from mssql to Elastic. In such cases, newly created or updated promotions may not be applied to the shopping cart because we get promotions from Elastic, causing a significant problem.

To proactively detect these potential issues quickly, we developed a comparator in Golang. This comparator checks whether the last 30 promotions created or updated within the last 2 minutes have been saved in Elastic and whether the values in the saved fields match the original values.

By implementing this solution, we can ensure that all promotions created or updated in MSSQL are correctly synchronized with Elastic. If any issues arise, we can quickly detect and take necessary steps to resolve them.

In conclusion, the approaches we have adopted for utilizing Elasticsearch have enabled us to optimize query performance, improve data management, and enhance the user experience. By implementing predefined values, managing multiple indices, and regularly deleting expired data, we have achieved an efficient search.

Ready to take your career to the next level?
Join our dynamic team and make a difference at Trendyol.
Want to be a part of our growing company? We’re hiring!
Check out our open positions.

--

--