How to archive over 500 million (and still growing) live data

Published in

Trendyol Tech

7 min readApr 8, 2022

Greetings to everyone, we have prepared an article with my teammate Fatih Furkan Has, in this article we wanted to talk about an archiving mechanism that we are currently using as a Trendyol Refund team. I hope it will be a useful article for you, I wish you pleasant and useful readings.

Introducing our tribe and team

Trendyol Order tribe is responsible for the order’s lifecycle such as creating, cancellation, refunding, etc. There are different domains that each team is responsible for. These are divided into Core, View, Fraud, International, and Refund.

As the Refund team, we are responsible for the charge domain in parallel with the cancellation and refund processes. After the order is created, all changes that may affect the amounts on the order are managed in the charge domain. We are working to ensure that cancellations and returns are carried out in a complete, correct, and fast manner without any errors and without human intervention in order transactions that require refunds.

Why do we need an Archiving Mechanism?

To better understand the archiving mechanism we have made, it would be appropriate to enter the subject by explaining the Charge domain. Generally speaking, the charge domain calculates the prices, discounts, and coupon amounts falling on each of the goods or services in an order. Also, the lifecycle of the order is kept by being versioned here. For example, when you place an order, the first version is created and in case of any cancellation and return, these are kept on the second version in the same document, so information about the lifecycle of the order is stored in a single document (JSON), and versions are immutable.

Now that we have a general idea, we can make comments on the data and enter why we should archive it. Each document in Charge represents order, and each order is located on the Couchbase database as a JSON document. When we look at the change patterns of the documents here, we observed that 99% of the changes occur within 60 days. In other words, it would not be wrong to say that data older than 2 months is cold data.

So why did the cold data bother us? As Trendyol, we often try to read and make sense of backward data, and shape our actions according to the data, in line with one of our values, “we talk with data”. In this case, the first thing that made us think was that our residency values on the Couchbase were low. The residency ratio on the Couchbase shows how much of the data is in memory and how much is in the disk, the higher this ratio, the more efficient it will be. The reason for this low rate is that our ram on the Couchbase is limited, our documents are more than 500 million, and it increases with a higher acceleration every day.

The second thing was that we were wondering if this data needed to reside in our main database. We know that the increase in data in Couchbase will also increase resource (memory) consumption. After these thoughts, we felt it would make sense to archive them and decided to assess the situation.

The main expectation from our archiving mechanism was to move the documents that we thought were no longer needed in the main database to a secondary database. But there were certain criteria:
1- Archiving will be on live streaming data
2- Each document must be archived at a variable date
3- The need to move the archived data back from the secondary database to the main database in case it needs to change again

We needed to make sure that the archiving mechanism we were going to build would meet these criteria. That’s why we rolled up our thoughts and started to create the technologies we could use, how they would be integrated, and the system design.

Technologies Used In The Archiving Mechanism

Couchbase

Couchbase, a NoSQL database system, is used on the Charge domain. To briefly talk about Couchbase, it is a document key/value-based database system that stores data both in memory and on disk. It has a memory-first approach, i.e. data is kept in both memory and disk, but priority is in memory. Depending on the amount of RAM on the system and the frequency with which the data is called, we can think of it as taking the data from the disk and writing it onto the memory. If you want to know more about this database system, which is used very actively in Trendyol, you can visit.
Couchbase’s ability to enable connectors thanks to its XDCR feature and its ability to store documents by specifying TTL also had important effects on our design.

PostgresSQL

PostgreSQL is an advanced, open-source relational database that supports both SQL (relational) and JSON (non-relational) querying. We chose this database to archive the documents we keep on Couchbase, our approach here is that PostgresSQL is more likely to be used for archival purposes, and we can also say that our Trendyol DBAs have know-how on PostgreSQL.

Kafka Connect

KafkaConnect is a tool that provides data integration between Kafka and external systems (databases, HTTP requests, etc.). It is used to transfer data from any external system to a Kafka topic or to transfer data from one topic to another system.

.NET

.NET, which most of us know, is a free and open-source framework that can work on Linux and macOS operating systems besides Windows. The team’s know-how on .NET was decisive for us to write the business code with this framework.

Go Lang

It’s confusing to refer to both Go and Golang as they represent the same language. The fact that Go is scalable, fast, reliable, and efficient has been our reason for preference on the consumer side. We used this language, which we generally use for consumer purposes in Trendyol, for the same purpose in our archive business. If we talk more abstractly, we can summarize it as a bridge between our .NET project and Kafka topics.

System Design Of The Archiving Mechanism

Now let’s introduce each component in the system:

Charge Api and Archive Api are both REST API Clients written in .NET.
Archive Listener is a service that acts as a listener written in Go and listens from the Kafka topic and sends a request to the Archive Api.
Kafka Connector is a service that listens for deleted events from Active Ids DB and leaves them to Kafka.
Main DB is a Couchbase database.
Archive DB is a PostgresSQL DB.
Active Ids DB is a Couchbase database where only Ids are kept.

Why do we need Active Ids DB?
You may have asked this question rightly, if our Main DB is couchbase, why didn’t we TTL directly into the insert made here?
The reason for this as that couchbase only gives the KEY of the document for expired documents, it does not give the VALUE. Since we need the value to archive it was not possible.

Now since all the components are introduced, we can explain how the system works.

Our system starts with the creation of an order, the Charge API that listens for the order-create event creates a charge then adds it to the Main DB and throws charge-create event. With this event, Archive API adds the TTL we specify the Id of this order to the Active Ids Db.

As long as the TTL has not expired, any request to the Charge API for this record will be handled by the Main DB.

However, things will be slightly different after the TTL expires;

When the TTL expires, that is, when the Id expires, the Active Ids DB on the Couchbase will throw an expire event for this expired record.
Kafka Connector listening to this event leaves the Id of the record that has expired in the charge-expired Kafka topic.
The Archive Listener listening to the charge-expired topic sends a request to the Archive API for this record.
With this request, Archive API takes the relevant data from Main Db and writes it to Archive Db and deletes it from Main Db.

Now, if a read or write request comes to the Charge API for this TTL expired record, two different situations occur.
For Read, the situation is quite simple:

When a read (query) request comes for a document whose TTL is full, that is, expired, when it cannot find it in the Main DB, it sends a request to the Archive API and reads from the Archive DB.

However, if the Charge API needs to make a change, namely write operation on this document, then it is moved from Archive DB to Main DB:

Moves Archive API document to Main DB and drops charge-recycled Kafka topic-related event. And Charge API is now reading from Main DB.
The Archive Listener listening for the charge-recycled event deletes the relevant document from the Archive DB and adds it to the Active Ids DB. In this way, the archive process is reset for the record transferred from the Archive to the Active DB.

Conclusion

Thanks to this system that we have built, we have reduced the ram consumption of our Couchbase databases and increased our residency value from 8 percent to 30 percent. We have successfully archived more than 500 million records.