Expedia Group Technology — Engineering

Inside Expedia’s Migration to ScyllaDB for Change Data Capture

How to migrate to ScyllaDB without downtime

Jean Carlo
Expedia Group Technology

--

A hiker watches the sun setting over a dramatic red rocky mountain landscape.
Photo by Elaine Casap on Unsplash

Introduction

In this post, I’ll walk you through the process of migrating a Cassandra cluster — with over 50 tables and thousands of persistent connections — to a ScyllaDB cluster. I will explain how we came up with a migration strategy base in our constrains like zero downtime.
Database migrations are often challenging, with constraints that vary depending on the context. For instance, you might need to migrate from one data source to another, which could involve preprocessing data before the migration. Alternatively, you might migrate within the same datastore but aim to minimize or eliminate any impact on the application side — in other words, ensuring that the service remains unaffected during the migration.
Data migrations can be time-consuming, sometimes taking more than 48 hours. Therefore, it’s crucial to have a reliable process that allows you to pause and resume the migration from where it was left off.

Company and team presentation

Expedia Group™ is a travel technology company that builds a complete platform to facilitate the interaction between partners and travelers. This interaction implies the manipulation of more than 70TB of data where high availability and low latency response are critical. At Expedia Group, Cerebro team helps to solve these challenges by provisioning databases on demand and providing automated operations like scale up/down, repairs and backups/restores on top of Amazon Web Services. A wide range of databases is supported like ScyllaDB, Cassandra, MongoDB, Aurora, Postgres, etc. Cerebro team maintains about 170 ScyllaDB clusters at this moment.

Why migrate from Cassandra to ScyllaDB?

The main reason for migrating from Apache Cassandra (an open-source NoSQL distributed database) to ScyllaDB (also an open-source NoSQL distributed database) was to profit from the advanced CDC (Change Data Capture) capabilities that ScyllaDB offers. Change Data Capture allows users to track data updates in their database.

Context: Cassandra limitations

This journey starts with a Cassandra cluster with 15 nodes i3.xlarge called Identity, this cluster was used by many applications to keep data for user authentication and sessions. The Identity cluster is critical because if data gets lost or the cluster becomes unresponsive for any reason, a large percentage of Expedia Group applications would be seen as down, and users would not be able to log into our platforms. In addition, this cluster stores sensitive data, which is the reason the cluster had set up TLS for client’s connections.

Inside this cluster, certain tables need special treatment if they receive updates. Those changes need to be replicated to other services as part of the use case. This complex interaction between the applications and the data must be handled in the simplest way possible. Therefore, to avoid dual writing on different datastores, the application team decided to rely on the Change Data Capture that Cassandra offers together with Debezium Connector.

Debezium is a library that works as a separate process independent from Cassandra, which means that it needs to be deployed and monitored which adds another point of failure to the entire system. In the next figure you will find an example of this architecture, notice that Debezium is represented and a process running inside the node apart of Cassandra.

Example of Cassandra with CDC and debezium

Another complication of choosing CDC with Cassandra is that it is possible that Cassandra table with CDC enabled could get blocked for writes in case the Debezium Connector stops consuming events. This could happen because the cdc_free_space_in_mb was filled up with events and the table starts to reject writes. This is a serious problem because it can produce instability in the cluster.

Choosing ScyllaDB

Considering all the points previously mentioned, the application team decided to migrate the database identity to ScyllaDB because offers a straightforward and uncomplicated implementation of CDC capabilities. ScyllaDB already implements a Debezium connector called Scylla CDC Source Connector that handles the events and allows the applications to fetch them in a transparent way. This connector also handles schema and topology changes as well as retries and it is fault tolerant using checkpoints to save its progress.

ScyllaDB has built-in support for Change Data Capture (CDC).

Migration requirements and challenges

Undoubtedly ScyllaDB was the desirable choice to work with CDC but to make it happen we will need to migrate all the data from Cassandra. This was about 1TB of non-compressed data, plus replicated data for a total of 3TB of raw data.

Apart from the data size, other constraints had to be considered:

  • Migration must be done without down time. If the cluster is down, users cannot login on our applications, and this will cause loss on profits.
  • Applications must keep TLS (Transport Layer Security) connections all the time.
  • No data loss allowed during the migration. We need to replicate all the writes coming into Cassandra to Scylladb in the interest of keeping both databases consistent until applications migrate to Scylladb.
  • Cassandra cluster has 15 nodes i3.xlarge, and it is using 4.0.1 version.
  • Applications that use this cluster are highly sensitive to latency.

Migration strategy

Our goal was to use a double database writes strategy from the applications side before starting data migration.

As we began our investigations, we first followed the recommendations from ScyllaDB using sstableloader tool, but we also notice that ScyllaDB team has developed a spark-based application called Scylla-migrator.

Based on our challenges and the data size to migrate, we evaluated the 2 options, and outlined the following characteristics of each:

Option 1: SStable loader

SStable loader architecture
  • SStable loader is a tool that must be used on separate machines. If run on the Cassandra cluster, it impacts the latency of the live cluster.
  • We should before copy the data to those machines where we are going to run the SStable loader
  • If SStable loader fails, we need to relaunch the migration from the first load. There is no checkpoint where we can pick up the last job done.
  • We must copy the data 3 times if we want to be sure that the data migrated is complete and consistent.

Option 2: Scylla-migrator

Scylla migrator in action
  • Scylla-migrator provides a save points feature that helps keeping track of the current token ranges migrated. If migration fails for any reason, we can restart the job from where we left.
  • Scylla-migrator will read the data on the live cluster, no need to transfer sstables files.
  • Instead of copying the data 3 times, we can just scan the tables with the consistency level we want and copy them into ScyllaDB.
  • We can use a Spark cluster to parallelize the spark job and make the migration faster. This means that instead of transferring sstables to multiple machines with sstable loader tools, we can just tune our job spark on a cluster.
  • Scylla-migrator provides also a validator job very handy to confirm the completeness of the data migration.

Migration option chosen

Given the SSTable loader limitations, we’ve decided to choose option 2: scylla-migrator.

In the next figure, you can see the whole migration approach together with the dual writes in the Scylla Migrator.

Duals writes keeping both clusters syncronized when scylla migratror is running

Migration prepartion: building the spark cluster

We have deployed a spark cluster of 6 nodes i4i.4xlarge. We installed the spark version 2.4.4 to be able to run the scylla-migrator. We chose this because a larger spark cluster will impact our production Cassandra cluster.

There were 52 tables to migrate, and each table needs to have a config file that will be passed to the spark-submit to start the migration. This config file keeps information about the source cluster and the target cluster, contact points, name of table and tuning properties like SplitCount. This is especially important for large tables because it will help us to split the table into smaller chunks to transfer the data.

We prepared an ansible playbook to install spark and deploy every config file for each table, which facilitated the deployment and the restart of every spark node in the case we need to change any configuration on spark side.

Migration fine-tuning

1. Spark fine-tuning

The default spark cluster installtion will not be enough to successfully run the scylla-migrator. We had to increase the following parameters:

  • spark.driver.memory, this was set to 5Gb. We increased to allow the validator job to run without problems.
  • SPARK_DAEMON_MEMORY was increased to 50Gb, because for some big tables with big partitions, some spark tasks were terminated with out of memory java errors.

2. Sylla-migrator fine-tuning

We started with 8 SPARK_WORKER_INSTANCES and 8 SPARK_WORKER_CORES per spark node. This was a nice starting point. Increasing the number of workers gave us more connections to the Cassandra cluster, and it could have an impact on the latency.

Regarding the SplitCount property, we originally kept it by default to 256. We later noticed that the SplitCount value must be increased when we were transferring large tables.

3. TLS connections to ScylladDB and scylla-migration

We knew that the Cassandra cluster only accepts TLS connections and Scylla-migrator can be configured to use TLS connections, but it does not offer the option to skip certificate validation as other drivers Cassandra do.

The problem is that our certificates are self-signed, and we are not validating them. All the applications skip validations but on scylla-migrator we could not use this option.

On Cassandra side this is easy to solve because Cassandra implements an option to accept TLS and no TLS connections and we could enable this in a transparent way to the applications. In contrast, ScyllaDB does not offer this option.

As a result, the migration must be done without TLS connections, from Cassandra to ScyllaDB. But once migration is completed, we need to make applications to use TLS connections. Therefore, we created a second datacenter on ScyllaDB cluster with just TLS connections enabled. Finally, all applications will jump on that datacenter once validation is completed.

4. Partitions with null values on cluster key

When running the spark job to migrate some tables, we found that for some reason some partition got null values as cluster key. The scylla-migrator behavior was to read those rows and insert them on ScyllaDB. Consequently, ScyllaDB responds with an error telling us that null values for cluster key are not allowed which is logic on Cassandra as well. We thought that scylla-migrator could just skip those rows and log them somewhere for further analysis, but this was not the case.

To solve this, we created a simple Scala code that will scan the table to find those partitions keys with the null values on cluster key. Once the keys were identified, we purged that data from the source and run the migrator job once again successfully.

5. Large tables and large partitions

Initially we deployed a ScyllaDB cluster of 9 nodes i4i.xlarge. We considered that for the traffic that the Cassandra cluster of 15 nodes was handling, a ScyllaDB cluster with 9 nodes will be enough.

However, after some tests, we noticed that some spark jobs for tables with more than 20GB were causing high write pressure on the cluster Scylla, tuning the scylla-migrator parameters fixed this.

The first parameter we changed was the split count from 256 to 768 for those tables, consequently this change will create more spark tasks and it also will increase the time of data migration.

Increasing the number of spark core instances from 8 gave us more connections to the clusters and saturated the nodes. Then instead of increasing the number of SPARK_WORKER_INSTANCES we just scale up the ScyllaDB cluster from 9 to 15 nodes only for the migration and increase the SPARK_WORKER_CORES to have more parallel transfers.

Due to those changes, we could migrate tables bigger than 20GB in a smooth way. After this, we brought back the cluster to 9 nodes.

6. CDC Tables

Before migration, the cdc on the tables must be disabled or you are going to end up with a known blocking issue (see here: https://github.com/scylladb/scylladb/issues/7251). Once the migration was done, we enabled the CDC on the tables and switch over all the application.

Data validation

During the validation phase, we were more interested to compare values than timestamps because each application was using its own timestamps set on a column in the table. So, we set compareTimestamps to false.

Validator does not scan the whole table, instead it just compares a set of rows depending on the value of failuresToFetch. We increased the default value to 100.000 depending on the importance and the amount of data of the table migrated.

This job can significantly affect the latency on the live cluster. Therefore, we had to reduce the number of cores and workers in contrast to the migration job.

The outcome of the migration was ultimately determined by the results of the Scylla validator reports, that allowed scanning automatically each of the 52 tables were individually. The migration was considered concluded when the reports indicated zero missing values in the target cluster. Since all the reports showed that no data was missing, we decided to consider the migration as finished successfully.

Once the migration was considered completed, the dual writes were maintained for some days as a monitoring measure. Subsequently, we proceeded to migrate applications reads to the the new cluster. At this point the applications relayed solely on the ScyllaDB cluster, making it crucial to monitor the system closely to identfy any issue. Since the validation confirmed that both clusters have the same data, the reads could only be problematic in case of timeouts connection or acces privilegies.

Fortunately, no issues were encountered when switching the reads to the new cluster, allowing us to finally stop the dual writes and officially conclude the migration.

Lessons learnt

  • Create a new cdc on the source cluster to isolate the scylla migrator reads from the normal production traffic. This will help running the validator job because that job can be resource intensive.
  • Tables using data collection types cannot be migrated with the original timestamp. This was explained later by the ScyllaDB team as a direct consequence of CQL protocol and not a limitation of the migrator.
  • It can be very handy to purge data before migration.
  • Moving to ScyllaDB help us to simplify the CDC process and reduce the number of nodes comparing to the previous Cassandra cluster. This reduce the cost on AWS and keep the same high availability level as we got before.

--

--