Trendyol Recommendations: How did we migrate our infrastructure from Relational DB to NoSQL?

A journey of scaling for high write throughput

Emre Tanriverdi
Trendyol Tech
4 min readMay 3, 2021

--

“Technology is the driver, ecommerce is the outcome.”

As Homepage & Recommendation Team, we lately have been working very hard to improve our scaling for Recommendations in Trendyol.

We’ve stopped using relational database and transformed our datasource to NoSQL (Couchbase) without losing essential features. We wanted to share our experiences and tell you why and how we did it.

Let’s first start with our legacy design:

The Recommendations Infrastructure was on a relational DB but it had a cache on it with Couchbase for 30 mins. We didn’t have cache here at first, but Recommendations domain grew larger by time.

As product count increased in Trendyol, our write frequency increased day by day and our read performance was affected dramatically.

It had come to a point that we couldn’t even drain the lag from the queue where we listen to write messages. To not affect daily users, we were scaling our write consumers at nighttime.

We are actively using a Kafka Lag Checker as Mert Bulut mentioned here.

And this screenshot shows that we were really struggling to drain our queue:

Kafka Lag Checker’s lag alert messages

Let’s show you how it looks like so you can picture it easily:

Global website for Trendyol

Why did we abandon Relational DB?

  • Couldn’t handle our write throughput (around 140–150M)
  • High response time for read (around 60–70ms, higher than our average)
  • High writes caused lag in between multi-DC replication

But why Couchbase?

  • Prior positive experiences in Trendyol
    (we tried a similar design on other projects)
  • Know-how of team members
  • Low response time (key & value access)
  • Can handle high write throughput
  • Multi-DC replication abilities (XDCR)
  • Rich documentation & community support

New design with Couchbase:

In our new design we’ve decided to only use Couchbase as our datasource. Since we are working international, we needed to store the culture value and we used productId:culture as our key.

The main reason we chose relational DB at first was because of the invalidation process. It was easy to listen to invalidations and directly write them into our database.

In the new design, we achieved a similar behavior by holding a Visibility Bucket to write product visibility data coming from invalidation topic.
This bucket helped us to know if that product is available for sale.
Ever since our domain has grown larger, a product generally has more sellers so we don’t expect a product to have “no seller” as frequent as before.

In Recommendations, we aim to always serve 24 recommendations for 1 product. topProducts holds active 24 products to serve, sorted by score.

As a fallback scenario, if in any case we can’t find any product from the list and return less than 24 recommendations, we remove that nonexistent products and fill that gap with allProducts starting from highest score. allProducts has all the recommendations including topProducts.

Also, before the writes were like:
[delete recommendedProduct1 from product1],
[delete recommendedProduct2 from product1],
[insert recommendedProduct3 into product1]
and this resulted to many messages in queue and many records to process.

In the new design, since the Recommendations are in a list, we take an up-to-date list from our queue and replace the list in our datasource. So the amount of messages we process from the queue decreased significantly.
(from 140–150M to around 20–30M)

Observations After Deployment

Before:

After:

a recommendation-api dependent api’s response time after deployment

And a quiet alert channel :)

The reason me and Fatih Yılmaz wrote this story is to lend those, who want to do a similar thing in their projects, a helping hand.

We hope it was helpful. :)

Thank you for reading! ❤️

Thanks to all our colleagues in the Homepage & Recommendation Team. 🤟

--

--