Trendyol Coupon Journey: Under The Hood, How Do We Show Collectable Coupons with Couchbase

Caner Yeşildağ
Trendyol Tech
Published in
9 min readSep 20, 2022

In this series of articles, we will inform you about the technical problems we encountered while designing the collect and win coupon architecture in Trendyol and how we find solutions to these problems. There were multiple solutions and suggestions for these problems we encountered. Instead of replacing the database, language or framework we use when we encounter a problem, we have always tried to use the technologies we have in the most effective way. By approaching the problems from a broader perspective, we tried to produce solutions at the architectural level.

What were the Technical Challenges?

  • Post the events to Kafka when the time comes

The first of our greatest challenges was to notify the team which will show these coupons on the product contents when the collect and win coupon is created by the seller.

The coupon states we needed to notify were;

  1. Collectable started
  2. Collectable expired
  3. Collectable canceled
  4. Collectable collect limit reached

Because a coupon may not be used as soon as it is created. When the coupon is defined with future startDate, it should be activated for users to win when the start date comes.

For this reason, we should have been able to tell the Client to show this coupon time is in or let’s stop the winning of this coupon for those whose endDate has come or the winning limit has been completed.

The first solution that came to our mind was to write a scheduler method that will run every x seconds, to constantly request Couchbase and check if there are any coupons whose start date came.

If the start date came, it would post the coupon that started the event on the Kafka topic of the Client. In addition, we had to check coupons that the end date has come and their winning limit has been reached in order to post end events for them. This solution has;

Cons

  • Increase throughput for the database(Couchbase)
  • Increase CPU and memory usage in service
  • Hard to manage exception handling mechanism
  • The cost of adding a field to the document, which solves the problem of posting duplicate events to the Kafka, is high
  • Our service runs as multiple instances on K8s, it would be even more difficult to manage this situation

Pros

  • Easy to change the code in service
  • Easy to scale
  • Easy to manage as all logic is in the same place on the service

In addition to these, if the scheduler method runs in very short intervals, TP would increase. If it runs slower than necessary, it might be too late to stop coupons that have expired or have reached the limit. Because of this, too many users could get errors and have bad user experiences.

How Did We Solve This Problem?

Couchbase Eventing Service

Couchbase eventing service is natively integrated with the Couchbase Data Platform, it does not require any third-party solutions to manage and respond to data mutations. The term mutation generally refers to document changes happening in the Couchbase cluster. We can consider these operations as create, update, expiry, or delete operations. In an event-based architecture, all data changes are reactive and in real-time.

How Did We Implement and Use Eventing Service?

When we analyzed Couchbase eventing, we thought that this service would be useful for our needs. Being a native service, it provides to design the architecture without the need for extra third-party services or tools.

In addition to this, Couchbase Eventing is a highly available, performant, and scalable service that enables any business logic to be triggered in real-time on the server when application interactions create changes in data.
It has reduced our maintenance costs as it can be developed, deployed, and managed from a single centralized platform easily.

Collectable Eventing function and Kafka Connector System Diagram

Functions are developed within the Eventing Service framework provide tracking data changes in your cluster. Therefore, when a new collectable document is created or a mutation operation is performed, the first method of our function written in the eventing service is OnUpdate() works and starts to process for each document. After a few business validations, it is sent to the methods that decide whether to process the mutation document or not.

function OnUpdate(doc, meta) {
log(`OnUpdate started with document id: ${meta.id}`);
collectableStateChanged(doc, meta.id)
log(`OnUpdate finished. id: ${meta.id}`);
}

It saves events which contain the collectable metadata information that the Client needs with documentId in CollectableStateEvent bucket. if the start time of this collectable document has come, the id should be a form of Start:CollectableId. if the end date has come id should be the form of End:CollectableId.

So how do they start and end date know when it has arrived? Because there is no data change on a collectable coupon that has not yet started. Likewise, the end date has arrived, but when there is no mutation operation on it at time t, we are thinking of the question of how to save the end event.

Here we can see the function called when a new collectable is inserted.

function newCollectableOperation() {
this.run = function (doc, documentId) {
const startDate = new Date(doc.startDate);
const endDate = new Date(doc.endDate);
const startTimerId = documentId + ":starttimer";
const endTimerId = documentId + ":endtimer";
doc.eventType = "Start";
createTimer(timerCallBack, startDate, startTimerId, doc);
doc.eventType = "End";
createTimer(timerCallBack, endDate, endTimerId, doc);
}
}

Here we made use of the createTimer() method of the eventing function. When a new collectable document is inserted, the newCollectableOperation() function is called after the onUpdate() method. In this function, the createTimer() functions are called after getting the startDate and endDate from the collectable document. This function triggers the timerCallBack function, which will run when the time comes, like a time bomb.

function createTimer(timerCallBack, timerDate, timerId, context)

TimerCallBack is the first parameter of the createTimer method and the name of our method will be called with a callback when we set the timerDate date. In this method, we prepare and save the data of the collectable start, end, limit reached and cancel events that we will write in the CollectableStateEvent bucket.

Below we can see the timerCallBack() function body

function timerCallBack(context) {

const eventId = context.eventType + ":" + context.id;
collectableStateEvent_bucket[eventId] = {
collectableId: context.id,
eventType: context.eventType,
brandIds: context.brandIds,
sellerIds: context.sellerIds,
categoryIds: context.categoryIds,
eventingDocCreateDate: new Date()
};
}

The second parameter, timerDate, is the date we tell when the timerCallBack() method will be called. The third parameter, timerId, can be thought of as an indicator that expresses the start or end of the collectable coupon we created for this timer. The fourth parameter, context, is our collectable document that enters our eventing function and performs the operations after it undergoes the mutation operation.

The same process is done for collectable coupons that have been cancelled or whose limits have been reached. When the Cancel operation comes, the collectable status changes from active status to cancel. It is noticed by the eventing function because there is a change on the document and the process starts. Again, the createTimer() method is called with the parameters. For these two streams, the timerDate is set to run 5 seconds from now. For a document that is cancelled or has reached the limits, timerCallBack is called async after 5 seconds and an event document is created according to the event type. If the Cancel event is Cancel:CollectableId, if it is limit reached, it is inserted into the CollectableStateEvent bucket with document ids as LimitReach:CollectableId.

How did we transfer events from Couchbase to Kafka with Couchbase Kafka Connector?

After inserting these events into Couchbase, another task that needed to be done was to post the events that the Client would consume to Kafka. We could take these events from the data source and produce them in Kafka ourselves. But the new producer had some cons;

  • When we add a producer to the collectable service, the resource consumed by the service would increase.
  • After posting the event to Kafka, it was necessary to add a new field, eventProduced, on the document in order not to reprocess the same event. When saving this field after the event is produced, the number of write operations would be increased.
  • In cases where multi-service instances were run, duplicate events could occur as a result of producing the same event by two different pods.
  • In order to query the unprocessed events, it was necessary to create a new index on the Couchbase.
  • A scheduled job was required to query these events.

Considering these cons, we decided to use Couchbase Kafka Connector instead of writing a new producer. We designed the system as shown in the diagram below.

Collectable Eventing function System Diagram

Since Kafka connector is a ready-made product, it provided us convenience as it is a product that we can use by adding only the relevant configurations without writing code. It was one of the biggest advantages that it provided us with delivery guarantees to Kafka for the events we store in Couchbase. It also eliminated the duplicate event post problem, as it kept which event it delivered within itself.

Continuous Integration and Continuous Delivery with Eventing Function

After we started using the eventing function, we added CI-CD steps to this process in order to deploy the changes made in the function in a healthy and fast manner. We wrote unit and acceptance tests for our eventing function. We also added the code-review process. Thanks to fully automated tests and code review, we prepare the code that can be deployed to the production environment in a healthy way.

After the Continuous integration process was completed, we would like to automate the next step, the Continuous Delivery process. Couchbase provides an Eventing Rest API support for the eventing service. Using this rest API, we decided to add a deployment step that will run in the Gitlab pipeline.

Eventing function GitLab pipeline

In this way, we wanted to eliminate the need to access the production database to deploy our eventing function. Moving all CI-CD processes through the Gitlab pipeline in a fully automated manner would provide a very healthy, reliable, safe, non-error-prone, easy-to-manage pipeline where we could quickly roll back to previous versions in case of error.

We created a file named deployment.sh for deployment. First, by sending a get request with a curl, the eventing code running on the Couchbase server should be pulled.

CB_EVENTING=$(curl -XGET http://Administrator:password@192.168.1.5:8096/api/v1/functions/[function_name])

Then, we should pull the new version of the source code of the eventing function which has been modified from Gitlab.

EVENTING_SOURCE_CODE=$(./src/"$FUNC_FOLDER"/eventing_file_name.js)

In the next step, the new version of the function and the old version are changed and assigned to a variable. appcode is the field that contains the code of the eventing function in the response returned as a result of the get curl request.

CB_NEW_EVENTING=$(echo "$CB_EVENTING" | jq -rc --arg code "$EVENTING_SOURCE_CODE" '.appcode |= $code')

After assigning the new version of the code to the variable, we now need to deploy. But first, we must pause the eventing function running on the server in order to proceed safely.

CB_PAUSE_STATUS=$(curl -o /dev/null -s -w "%{http_code}\n" -XPOST http://Administrator:password@192.168.1.5:8096/api/v1/functions/[function_name]/pause

If CB_PAUSE_STATUS returns 200, we may continue. Otherwise, we can break the process and write the error log.

if [ "$CB_PAUSE_STATUS" != "200" ]; then
echo "Couchbase eventing function PAUSE request failed."
exit 1
fi

After the pause process is successful, we can send the curl request required for deployment.

CB_DEPLOY_STATUS=$(curl -o /dev/null -s -w "%{http_code}\n" -d "$CB_NEW_EVENTING" -XPOST http://Administrator:password@192.168.1.5:8096/api/v1/functions/[function_name)

If the Deployment of the curl response status is not 200, we should resume the old function. If the response is successful, we will not need to send curl for the resume process.

if [ "$CB_DEPLOY_STATUS" != "200" ]; then

CB_RESUME_STATUS=$(curl -o /dev/null -s -w "%{http_code}\n" -XPOST http://Administrator:password@192.168.1.5:8096/api/v1/functions/[function_name]/resume)
fi

Conclusion

In a conclusion, in this article, we shared the technical problems we encountered and the architecture we designed to solve these problems. In addition, we showed the technologies we use and how we implement them. We experienced one of these technologies Cocuhbase eventing service in the production environment. We mentioned how it solved our problems. We used the existing solutions in the most appropriate way by adapting them to our own system. We have made improvements by taking the opportunities and solutions offered to us one step further in every situation. I hope this series of articles has been inspiring and useful for you. Join us to develop high throughput, scalable and solid applications. Thanks for reading…

--

--