Data store migrations can be complex, especially if the application cannot afford to have maintenance downtime. We will explore some strategies that can help you navigate the complexity.
In our case, it was a migration from a NoSQL data store, DynamoDB into a relational data store, MySQL. However, this post is generic enough to make the approaches applicable regardless of the source and target data stores.
We set out to migrate our data from one data store to another while keeping the application running seamlessly for our users. I will start off by introducing some of the common migration approaches, and then cover how we went about it.
The simplest approach is an offline migration i.e. take the application offline stopping reads and writes to the data store. This allows data to be migrated across as a batch operation. Once the batch operation is complete, the application can be pointed to the new data store and brought up again.
The biggest drawback of this approach is the downtime involved which could be anywhere from minutes to days depending on the size of your data. For an application with active users, the Service Level Agreements (SLAs) will generally determine if this is even a feasible option.
An online migration moves the records across to the target data store as they are accessed (i.e. a “dual read” migration). Reads are first attempted from the target data store, with fallback to the source data store and a copy across to the target. All writes always go to the new data store.
This adds more complexity with the two data stores “live” at the same time e.g. differing behaviour of the data stores with regards to query performance, transactions etc. It will also mean incurring a performance penalty on the first access as there may be multiple requests to the data stores until data is retrieved. The flowchart below shows the different paths that an online migration may take.
Online migrations rely on the records being accessed for them to be migrated. In most cases, this would mean a long tail of records migrating across as they are accessed over a period of time. Additionally, some records may not be actively accessed by the application, but must still be migrated.
This is where running the offline migration in conjunction with the online one can accelerate things. It is important though that the offline migration is designed as idempotent (think upserts instead of inserts) if it runs alongside online migration. We refer to this as a backfill which can be executed as a batch job.
Monitoring and execution
It is important to have visibility into the migration process, which can be provided by metrics. We emitted statsd metrics and visualised them using Datadog. The key metrics emitted were
- hit: emitted when a record is found in the target data store.
- miss: emitted when a record is online migrated.
- backfill: emitted when a record is migrated by offline batch job.
We found these very useful in having the confidence that things were going to plan. Here’s what our progress tracking looked like — it started off with a lot of misses quickly tapering down as records were migrated across with a long tail until the backfill job was executed.
Once the backfill job was run, all data was moved across to the target data store. At this stage, the application was solely reliant on the target data store, and stopped accessing the source data store. We still left the source data store around for a month (without being accessed by the application) to verify data store integrity. Any issues with missing data in the target data store were expected to be captured via regular alerting mechanisms i.e. “Record not found” scenario in the flow chart.
Some other considerations
If the data can be horizontally partitioned, it may be beneficial to migrate it in parts which would allow validation of the new data store, and resolve any teething issues early on. This is also likely though to add complexity to the migration process and needs to be weighed up against the benefits.
Data Structure changes
If you are moving data between two different types of data stores (NoSQL to RDBMS), the shape of the data is likely to change. This would add complexity, which can be mitigated by using adapters.
Shared data stores
Data stores shared between applications are generally not a good idea. However, if you happen to be in a situation where the data store is shared by multiple applications, some of the concepts here would be still applicable but it would require more careful orchestration.
This is by no means a comprehensive list of all possible scenarios with data store migrations, but I hope the core concepts would provide useful building blocks for coming up with something that fits your use case.
The key thing to ensure is to have visibility into the progress of the migration, and some careful planning before jumping into execution.
“Measure twice, cut once”!
Thanks to my teammates Toby Corkindale, Amy Guo, Derrick Cheng, Marcelo Garcia, Patric Cui, and Simon Fenton for making this happen, and to Marius Gherman and Steven McPhillips for reviewing and helping refine this article.