Real-Time Calculation With Couchbase Eventing Service on Growth Center Level Architecture
This article will discuss the Seller Level system for Trendyol sellers, the challenges we encountered while implementing the level mechanism, and the solution we applied for real-time calculations with Couchbase Eventing Service and Timers.
What is Seller Level:
As a seller growth team, we provide gamification on the seller center to Trendyol sellers in the pages we called Growth Center and decided to implement a level mechanism to increase the engagement of the sellers.
In this level system, each seller is assigned to a level according to the experiment point collected from sellers’ Seller Center actions such as joining campaigns, adding products to a hunt, creating coupons, etc., and as business logic requires that each experience point seller earns from any action that they did has an expiration period of 6 months.
Let’s talk about the overall architecture in 3 pieces:
Consuming Messages
Various teams are feeding different topics on Kafka for seller activities and all those topics are consumed by our small dedicated applications to related domains.
Due to the changing environment of the Trendyol seller business logic, teams change their strategy occasionally. Therefore, we separated each consumer application to be scaled separately and have its logic without affecting one another.
Storing Point For Seller Actions And Calculation
Each consumer application sends an event to Kafka to let sellers earn points for their associated actions. For this purpose, all action point activities are kept in Postgres to provide efficient results through pageable and time-based queries.
Each save operation also triggers a calculation operation for the specific seller that earned a point from the action. Each calculation queries all non-expired (done within the last six months) actions and sums the total point. After the calculation, the seller is associated with a level that will be saved to Couchbase to get this information with key-value pairs. Since the seller-level information is the one that is widely retrieved from the client, accessing seller-level details on the key was very beneficial.
Expire Point
Until now, the system is already calculating the data in real time; however, expiring the points is the challenging part of the design.
The first way we found was by setting up a job and expiring the points on each scheduled period. And even set up some calculation time to inform sellers that their calculation is ongoing. However, we began to think this would introduce a bad experience on the seller's side, so we decided that the point needs to expire in real-time when their expiration date has come.
At this point, we come across Couchbase Timers and Couchbase Eventing.
Couchbase Eventing
In this article, we focus on how to use the Eventing Service of Couchbase on production and real-life usage. However, if you want to deep-dive into this topic, you can check the following article that my colleague mentioned about capturing data mutation in Couchbase through Eventing Service.
Couchbase Timers
Their documentation says Couchbase Timers are “asynchronous compute, which offers Eventing Functions the ability to execute about wall-clock events. Timers also measure and track the elapsed time and can be used while archiving expired documents at a preconfigured time”.
Also, Couchbase guarantees that each timer will be executed at least once despite node failures and cluster rebalances; however, the execution of the timer may happen in different nodes rather than its created node.
To set timers to the expiration time of the point (created date plus six months), after the data is stored in Postgres, we also save a copy of it to a bucket we call SellerLevelActions that also behaves like a long-term cache. Then, with the Eventing Service that Couchbase provided, we create a Couchbase timer on each upsert action done to the bucket.
function OnUpdate(doc, meta) {
var context = {
id: meta.id,
sellerId: doc.sellerId
}
var expirationDate = new Date(doc.expirationDate);
createTimer(DocTimerCallback, expirationDate, meta.id, context);
log('From OnUpdate: timer created', context);
}
When Couchbase fires the timer, it runs the function we provided as a callback action. In this function, we call an endpoint that will send calculated events to Kafka to update the seller’s level, the same process as the calculation after the Postgres insert happened.
How to make it more reliable?
We automated the expiration process of each point and seller-level update by using basic Couchbase Eventing. However, the following system is vulnerable to API call failure. In order to achieve a more reliable system, each action that saves to SellerLevelActions has a status field.
function OnUpdate(doc, meta) {
var context = {
id: meta.id,
sellerId: doc.sellerId
}
if (doc.status === 'INITIAL') {
createTimer(DocTimerCallback, expirationDate, meta.id, context);
N1QL(
"UPDATE SellerLevelAction " +
"USE KEYS $1 " +
"SET status = 'ACTIVE'",
[meta.id]
);
log('From OnUpdate: timer created', context);
}
else if (doc.status === 'FAILED') {
if (doc.retryCount < doc.backoffMaxAttempt) {
var timerTime = nowPlusMilliseconds(doc.backoffIntervalMs);
createTimer(DocTimerCallback, timerTime, meta.id, context);
log('From OnUpdate: timer created for retry', context);
}
}
}
On each upsert to Couchbase, we set this status field to INITIAL, and after the upsert event triggered by Couchbase, the status is set to ACTIVE, which indicates the timer is active and waiting to be fired.
When Couchbase fires an expired event, if the endpoint returns 200, the record is deleted from SellerLevelActions. However, if the endpoint comes across an exception with a status other than 200, the corresponding record’s action field is updated to FAILED, and retryCount is increased by 1. And when the counter reaches the given max value as backoffMaxAttempt, which is also provided in the SellerLevelAction record, the system stops the retry process.
In order to have knowledge about failed timers and have them retried some other time, we set a cron job that collects the SellerLevelAction records that have FAILED status once a day and lets us know the failed count through the Slack channel we set to.
Advantages
- Available with high traffic, in our production environment, we consume millions of events from other teams to track seller actions. Therefore, the event cache we store in Couchbase is permanently active. However, the timers are consistent and can work with more significant numbers of instances without a problem.
- Easily accessible, both eventing and timer management of Couchbase are easy to access from their UI. Also, all the timer logic can be managed from a singular JavaScript code.
- Consistent timing on timers, as we tested by using the Eventing Service, and their documentation says that timers will be fired at max 7 seconds delay. In our case, it is highly acceptable if the timers are consistently working.
Disadvantages
- Accessing the event logs is complex, the default log tracking window has a limited amount of space for logs and has no capability for any filtering.
Couchbase suggests creating a log bucket if advanced log tracking is necessary. However, in a production environment where millions of transactions happen, the log bucket can get larger and larger over time, so this must be considered.
In our case, we only required the logs once to identify the pause / undeploy situation. Logs are stored in a specific location at the Couchbase server; therefore, they can be obtained if you can access these directories.
Conclusion
In this article, we tried to explain how we used Couchbase Eventing Service and its Timers in Seller Level architecture for real-time calculations in millions of event flow. We also tried to explain how you can make these systems more reliable with a retry mechanism that sends the desired API call as much as the defined maximum back of attempt in case of any failure on the Couchbase Eventing.
Hopefully, this article will give you another solution rather than setting up a job that works in specific periods when you need real-time calculation in any project.
Ready to take your career to the next level? Join our dynamic team and make a difference at Trendyol.