How do we reach success in our CQRS Data Synchronization?

Emre Odabas
Trendyol Tech
Published in
6 min readSep 5, 2023

Last year, I wrote an article about “How did we fail in our CQRS Data Synchronization?”. You could check the details of that failure, but summarily;

  • Static membership of Coucbase to Elasticsearch(CBES) is not a good solution for the reasons below.
    → Scale via coding
    → No Fault tolerance
    → Caring instances are not Cloud Native
    → Hard to Alert and Fix
  • Autonomous CBES has much complexity with Couchbase, Elasticsearch, and Consul connections.
  • Coupling our team dependencies

From that point, we try to manage our CBES services in static mode and surround them with alerts for every possible failure. Now, it is time to talk about our success story.

Table of Contents

Deny of DCP

CBES Design

Our default CBES flow is shown above. When you look closely, we produce our content ID to Kafka after saving it to Couchbase. This step is required because of our business needs. The existence of this step encourages us to replace CBES with a more straightforward way.

DCPless Design

In this DCPless design, we consume events from Kafka, get the latest data from Couchbase, and index it in Elasticsearch. This is a very intuitive flow between Coucbase and Elasticsearch.

As we had failed to run it autonomously earlier in the indexing team, we decided to deliver this simplified design on a less stressed and low-volume CBES at the Indexing Meta team, and they have successfully lived with that design for about six months.

It was the time to move forward and take replacement risk on our high-load CBES instance on the Indexing Team. We worked for three weeks and started to do benchmarks. As a result, even though this design has a more straightforward implementation, it performs less cause of reaching our Couchbase’s upper resource usage limits. We could live with that performance, but that Coucbase instance is the Achilles heel of our team. So this design was not acceptable to us. So, it was time to accept failure and move on to the next and hopefully final solution.

Achilles heel of Indexing Team

Meet the power of the Go and DCP

Even though we have failed to convert CBES several times, we have been living an outstanding re-platforming journey that mainly relies on Go. We released several medium articles about this process.

While we were tackling CBES issues, our Product tribe member Eray Arslan has been developing a Dcp client with Go to solve the same problem. We discover each other in Elasticsearch Guild, part of our sharing culture. Our Go knowledge and the CBES problem finally crossed each other, and from that point, the Go CBES solution became our destiny.

What is Go CBES?

When we talked about Go CBES, it was our open-sourced project, a.k.a. Go Elasticsearch Connect Couchbase. It handles Elasticsearch batch writing abilities and uses Go DCP Client for handling DCP events. At this point, I need to mention that we have been developing these open-sourced projects in Trendyol with our devoted and hardworking main members: Eray Arslan, Mehmet Sezer, Oğuzhan Yıldırım, Abdulsamet İLERİ and Caner Patır. Besides that, we thankfully have several contributors and await your contributions to our projects(1,2,3).

Replatforming Journey

Okay, we have a new journey to run. We used to fail with CBES. So, it is time to make a perfect migration plan and ensure no more failure in the production environment.

1. Implementation
This is the easiest part. We only develop our model mapping and give it to Go-CBES as a function. Our mapper seems like the one below. You can check our examples on GitHub.

func mapper(event couchbase.Event) []document.ESActionDocument {
if event.IsMutated {
e := document.NewIndexAction(event.Key, event.Value, nil)
return []document.ESActionDocument{e}
}
e := document.NewDeleteAction(event.Key, nil)
return []document.ESActionDocument{e}
}

2. Automation Test

This is the indispensable part. Our QA team (Burhan Günaydın, Eda Kaçmaz) develops automation tests for new applications before delivering them to the production environment. We implement our test scenarios containing provisioning Couchbase and Elasticsearch instances and run our automation tests on the QA environment. We ensure that our domain needs are fulfilled with this step.

3. Load Tests

This is the most enjoyable part. We love to do benchmarks and compare results. Go CBES already has a benchmark that processed ~1m messages in 50 seconds, while Java CBES processed the same messages in 80 seconds.

Here is the scenario. We already have two identical ES clusters to handle our loads. We replace one of our CBES instances with Go CBES and reproduce our reindexing scenario with 400M content. Even though Go and Java CBES consume similar CPU and Memory, Go has a 3x indexing rate while under heavy load via reindexing.

load test results

This promising result gave us the courage to go the next step.

4. Data Detection Tests

Data detection test step

This is the most suspicious part. Even if our unit and automation tests passed and load tests performed well, we need to pass one more step to production. As shown in the above image, we have two identical ES instances with CBESs. Go and Java CBES follow the same DCP events and write to these ESs. Firstly, we excluded our Go CBES’s ES instance from production and then compared data between those instances. We created a scheduler job that worked every 15 minutes and checked the same data on both sides. This job lets us detect our data anomalies. After fixing our function, we nearly ran this job for a week without any problem. This success is leading us to the next step.

5. Production

This is the most desired part. We could finally release our Go CBES to the production environment. Even though this whole process is trustful, we replaced only one instance of CBES with Go CBES. We have been using it for nearly three months. In this period, we tackled some metric problems and fixed them in our open-sourced project. Last day, we also observed resource usage that shows Go CBES is an absolute winner.

production results comparison

Today, we replaced all Java CBES instances with Go CBES. And we finally reached our SUCCESS STORY at the end of this long CBES journey.

TL; DR → Conclusion

  • DCP is a powerful and efficient way of implementing an outbox pattern for Couchbase. These are the high-performant solutions;
    -> Go DCP Elasticsearch
    -> Go DCP Kafka
    -> Go DCP Couchbase
  • Sometimes, “simple is impossible” to implement. Choosing a naive or native way could be better.
  • Concentrate on your problem and try to solve it with what you have. Sometimes, a hammer in your hand could solve your nail problem. Don’t bias yourself with Maslow’s Hammer
(*) Maslow hammer
  • Try to ensure you are on the right way.
  • Accept your failures and try again.
  • Do whatever you need to deliver your codes to production.
  • Don’t underestimate data versatility and high load.
  • Trust in Data
  • Strive to be better every day.

Thanks to Abdulsamet İLERİ, Mert Bulut, Kutlu Araslı, Kerem Can Kabadayı, and the Product Offering Squad for supporting the success story. Thanks for reading so far. If you like our conclusions, come and join us.

--

--