Data Sync Approaches in Application Migration

Nilabh Sagar
Walmart Global Tech Blog
9 min readJun 12, 2020
Image sources: house(left), puzzle, house(right)

“Application never retires, it transforms to new capabilities.”

Every application evolves over time and as the time passes a phase comes when the current environment restricts further growth of the application and demands better and enhanced capabilities. It’s an inevitable event which is bound to happen and the chances that you are reading this means it’s the time for your application.

Migration is a multi-step process and must be done carefully to avoid unnecessary delays, data loss, bad user experience, high cost, etc. A careful planning, coordination and execution are a minimum expected from the teams involved in migration. For some application Lift-and-Shift is the suitable approach while for some its essential to Re-build the complete application.

Lift-and-Shift

Lift-and-Shift or Re-host typically involves shifting your system and associated data from one environment to another e.g. on-premises to cloud environment. The existing code base without any changes are ported to the new environment basically it’s just a change of physical servers. This is quite easy, fast and inexpensive but might not be beneficial because of the fact that you are not leveraging the capabilities of new environment to its fullest.

Rebuild

This involves complete re-architecting the application to utilise full capabilities provided by new environment. The application is re-coded in way to exploit the native capabilities provided by new environment for better performance and scalability. As you would have guessed by now that this is complex, time consuming and requires high cost but at the end it has high ROI.

Image sources: left, right

So, with this much of background let’s focus on our main intent data sync approaches which is an important aspect for re-building.

Re-building a complex application is a resource intensive and time-consuming process and cannot be done in one go. The planning involves multiple phases of development and deployment of the new system which involves keeping the old and new system in an active-active mode. Keeping active-active requires that the data is in sync between the systems.

For the cases where the new data model + data store is same as the existing; the team should look for an existing tool which can help in keeping the data sync between the two system before taking a step towards building a sync pipeline. For the cases where the new data model + data store is entirely different from the existing e.g. SQL to NoSQL demands a well thought data sync pipeline which we will focus below.

It’s important to understand the usability of your application to make an informed choice of data sync approaches. Broadly based on my experience one can go for either “greedy sync” or “on-demand sync”. Let’s look at both the approaches in detail.

Greedy Sync

As the name suggest sync is performed between the systems as and when it is occurring by intercepting the original request. The intercepting component could be built as a library or sidecar pattern with on/off switch feature. However, libraries provides deeper level of integration and less overhead. Both library and sidecar allows us to remove the component without affecting the actual code in cases where sync is to be stopped.

Note: When the sync is from old to new we termed it as forward sync and when it is from new to old it is termed as reverse sync.

The primary responsibility of these component to the minimum are

  • Perform transformations needed to convert the payload into a compatible format before it is consumed by the target application.
  • Avoid cyclic API call between the system. A simple way to achieve this is by setting a marker header by the component when the call is initiated for sync. The component validates and takes necessary action to avoid cyclic API call.

Now that we have identified the ways to capture the changes it’s time to look at ways to exchange these changes between old and new system. Depending on the application’s need this could be done as real time or near real time. Basically, it’s a choice between strong consistency(real time) vs eventual consistency(near real time).

Realtime Sync (Strong consistency)

In real time sync the components built above first updates its own database and upon success makes a direct API call to new or old application. We must consider that in a distributed systems failures are inevitable and system must be resilient to handle such failures. In case of API call failure apart from retry the system must fallback to near real time sync approach (discussed below) or rollback. The failure handling choice depends on the application’s behaviour.

Near Realtime Sync (Eventual consistency)

In case of near real time instead of calling the new/old system API and wait for the response before returning response to user the system adds the request info + payload to a messaging system and returns immediately thereby greatly improves the response time. The consumers attached to the messaging system performs the payload transformation, API call, retry in case of failure, etc. to sync the data at the other end.

Based on our experience we suggest to build the sync pipeline using an hybrid approach having both real time and near real time capabilities and manage each endpoint’s behaviour i.e. real time vs near real time through configuration for better flexibility. This approach helps in controlling the way to perform sync for individual APIs in running system by just changing the configuration as and when needed.

On-demand Sync

As compare to greedy sync in an on-demand sync instead of pushing changes as and when it happens the system pulls changes when data is accessed for the first time. This requires to maintain user’s last access system information e.g. old or new which helps to execute pull and merge step only in cases when there is a difference between last and current system access. E.g. if a user’s request comes on old system and the last access info was system “new” then the old system will pull and merge changes from new system; as long as user sticks to the old system no further pull and merge will be performed from the current system.

Looking at On-demand sync you might be thinking that this seems to be an optimised approach and indeed it is optimised as it reduces the number of network call to a great extent. However, I would like to add that there are some scenarios in which Greedy sync would have been a better choice. Consider a scenario in which there is a downstream email service to send an email for every change in data and the email service is still running in old environment. Now, because of this no email will be going out for the changes happening in new system. Had it been greedy sync the changes in new system would have synced to old and in-turn would have triggered an email.

Data consistency is an important aspect and it must be taken care because there are possibilities of updates happening to same data in old and new system simultaneously which can result in conflict.

Conflict Resolution

As sync systems will run in different environment we need a way to serialise concurrent writes to ensure data consistency. We need a system which can provide a strictly monotonically increasing number in a distributed environment. In cases where you are restricted to use only PaaS components you can go with a component which is globally replicated with acceptable consistency and low latency. E.g. in case of Azure we can go with Cosmos DB using session consistency + stored procedure + single write region. If above is not a constraint then the obvious choice will be to use Zookeeper. It provides locking mechanism which works across processes and across machines. This way only one holder of the lock will be allowed to perform update and avoid conflict.

For our cases we will use the sequence number returned by Zookeeper while creating a child node using “EPHERMAL SEQUENTIAL”. These child node should be created using an unique identifier of an entity e.g. “employee id”. This means that all clients who wants to make changes to the same employee id will be given a sequence number in the order zookeeper processes these request. The client with the lowest number will get the lock and proceed with the execution. The sequence number given to this client will be attached to the message sent asynchronously for sync and will be processed using below algorithm to resolve conflicts.

Conflict Resolution

Let V is value of the field F at sequence number t which is denoted as

Ft = V { i.e. at sequence number t the value of F is V }

So, we can have cases when both the system can have independently values for F as below

Ft₁ = V₁ { i.e. at sequence number t₁ the value of F is V₁ }

and

Ft₂ = V₂ { i.e. at sequence number t₂ the value of F is V₂ }

Given above we can write a conflict function CF as

Conflict Function

The above conflict function is applied to each field present in the participating entity for the ongoing operation. The system also preserves the winning t against F for subsequent operations.

Below is an example to illustrate the working of above conflict function.

How you plan and migrate your clients/users is also an important aspect towards successful migration. The plan must clearly identify the type of clients/users and how they use existing system. One should make sure that all clients/users are allowed to have access to old or new systems in such a way that read-your-writes are not broken.

At the end I would like to highlight that the success of migration depends on the effectiveness of the new system and one must define and capture various metrics to measure the effectiveness. Having a data comparison pipeline to check quality of data at a regular interval is an added advantage to measure efficiency of the system over time.

One ending thought specifically for Monolithic, don’t try to dismantle each aspect into a microservice in one go instead take an incremental approach. Define priorities of each aspect based on system’s behaviour and if cost is not a barrier; have separate team focus on individual aspect as per priority. Ensure that all team collaborate in some form like regular meetings, document exchange, etc. For teams to work together it’s important to have less dependencies or timely availability of dependencies. To do that team should define the contract upfront and publish it time to time when changes happens to the contract. Developer should follow good design patterns while writing code to meet the need of changes during development. As the complexity present in dismantling monolithic is huge the chances of requirement getting change during development is inevitable. In such cases a good design patterns helps to make minimal changes across which saves time and cost.

--

--