The Strangler Pattern in Practice

Five Milestones on the Road to the Cloud

HomeAway’s cloud migration strategy uses change data capture (CDC) streams to unlock data at rest and accelerate our movement to the cloud. Our event streams presently unlock data from over 25 services using over 200 relational tables. This data feeds 400 new data structures in distributed data technologies such as Apache Cassandra, Neo4j, MongoDB, and Elasticsearch. Are we done? Not a chance, but we have a winning pattern. Our goal is to unlock all legacy data.

Migrating to the Cloud

Your architecture, number of services, and amount of data that must be migrated can make your cloud migration daunting. That is why it is so important to have a strategy. I often hear mandates to shut down services by a certain date, but there is often little thought of downstream components, intelligent routing, and strangulation.

In most cases, attempts to move to the cloud either fail entirely or are completed much later than estimated. I think it is important for teams to think through the entirety of the problem. The challenge in moving to the cloud is not as simple as moving your piece of the puzzle. Typically, there is a larger ecosystem that has to remain functional while migrating to the cloud.

The following diagram is a small snippet of a representation of interactions at HomeAway before applying the strangler pattern. It depicts how two services connect at both a service level and a data level.

The light orange dots in the image are services, while all the other dots are dependencies

When I included this image in a technical presentation, the audience experienced sheer disbelief and astonishment. After my talk, I was told by a few — jokingly — that the image hurt their heads.

In retrospect, I presented this image prematurely. My intent was to illustrate our capability to identify dependencies throughout the stack, but what the audience saw was a demotivating, tangled mess.

Looking at the dependency graph, teams began to ask themselves, “How can my service move to the cloud, if my dependencies are not ready to go?” This is the point at which I think reality set in.

The problem with dependency tracking is the burden is always on the other team, which quickly becomes a finger pointing match. For example, X is waiting on Y, Y is waiting on Z, and so on. In this world, no services move to the cloud.

The question that I asked myself and my team was, “How do we non-intrusively move services to the cloud ahead of their dependencies?” We began work on a blueprint which includes deprecation and decoupling of on-premise data center resources.

We spent the better part of last year laying the foundation we will now build upon. The infrastructure, pipelines, and tooling required to accelerate teams to the cloud are now available.

The Strangler Pattern

The most important reason to consider a strangler application over a cut-over rewrite is reduced risk. A strangler can give value steadily and the frequent releases allow you to monitor its progress more carefully. Many people still don’t consider a strangler since they think it will cost more — I’m not convinced about that. Since you can use shorter release cycles with a strangler you can avoid a lot of the unnecessary features that cut over rewrites often generate. — Martin Fowler

Martin Fowler identified the strangler pattern a number of years ago as a way to migrate legacy applications while minimizing risk. We started to look at ways we can leverage this thought process to streamline our cloud migration blueprint. The primary components of the blueprint are the cornerstone of the strangler pattern: event interception and asset capture.

Data: The Key Ingredient

The asset that we are all after is data; therefore, to be successful, one must have the ability to intercept or subscribe to the system of record (SOR) change stream. There are not a lot of tools in this market. Most CDC tools only work with a particular class of data technology and not for a polyglot environment. HomeAway developed a tool to act as an event interceptor and capture the change data stream from both SQL and NoSQL data platforms and, importantly, to synchronize data between those platforms.

The tool is called DataSync. DataSync is a CDC service that reads changes from the commit/transaction log of multiple data platforms such as SQL Server, MongoDB, Kafka, and Cassandra. These change events are persisted to an internally developed pub/sub event store called Photon. It allows us to write/consume a continuous stream of data mutations. Photon provides bi-directional synchronization capability amongst heterogenous data platforms, with strong consistency guarantees and exactly-once semantics. For example, you can stream changes from SQL Server to Cassandra or from Cassandra to SQL Server, or you can synchronize Cassandra and MongoDB.

The last key ingredient is to have a router of some kind redirect traffic to the legacy or new microservice depending on the functionality at hand. My preference here would be to use intelligent routing at the edge layer, but there are multiple ways to achieve the desired result.

With these tools in place, we can move any service to the cloud ahead of its dependencies while maintaining legacy service contracts.

Milestones

There are 5 milestones to complete the strangulation of any given service. For the example, let’s assume there are three monolithic services hitting the same database. Within a single service, “Legacy Service 1,” three microservices are identified.

The following storyboards illustrate the process of strangling “Legacy Service 1.”

Milestone 0

M0- Starting point

Milestone 0 is ground zero and where most legacy services are today.

Image Highlights

  • This is where we start and not where we want to be

Milestone 1

M1 — Focus on getting reads in the cloud

The first milestone is about rearchitecting legacy services as cloud optimized microservices. Once in the cloud, each new microservice can read local data; however, writes will occur in the original data center.

Image Highlights

  • Legacy services dependent on “Legacy Service 1”, will read and write in the data center, while “µ Service 1” , will perform reads in the cloud
  • Writes from “µ Service 1”, will occur asynchronously in the data center
  • DataSync will ensure the data from SQL is synchronized with Cassandra (one direction)
  • The content router should know which requests are routed to “Legacy Service 1” vs “µ Service 1”

Milestone 2

M2 — Shift focus to getting some writes in the cloud

The second milestone places emphasis on establishing a heterogeneous/homogeneous multi-master, so writes in both the data center and the cloud can be synchronized. This milestone is the turning of the tide because reads and writes can be served in the cloud. Strangulation cannot occur without having completed this stage.

Image Highlights

  • Legacy services dependent on “Legacy Service 1” will read and write in the data center, while “µ Service 1a” will perform reads and writes in the cloud
  • DataSync will ensure the data from SQL is synchronized with Cassandra (bi-directionally)
  • The content router should know which requests are routed to “Legacy Service 1” vs “µ Service 1a”

Still Milestone 2…

Still M2…

You will remain in milestone 2 as long as there are services or functionality dependent on the legacy service.

Image Highlights

  • Legacy services dependent on “Legacy Service 1” will read and write in data center, while “µ Service 1a” and “µ Service 1b” will perform reads and writes in the cloud.
  • DataSync will ensure the data from SQL is synchronized with Cassandra (bi-directionally)
  • The content router should know which requests are routed to “Legacy Service 1” vs “µ Service 1” vs “µ Service 1b”

Milestone 3

M3-Strangle the legacy service

The third milestone places emphasis on iterating the pattern applied in milestone 2, until all functionality for a given service has been rearchitected into cloud optimized microservices.

Image Highlights

  • All functionality of “Legacy Service 1” has been rearchitected as a collection of cloud-optimized microservices, and all reads and writes occur in the cloud
  • DataSync will ensure the data from SQL is synchronized with Cassandra (bi-directionally)
  • The content router should know which requests are routed to “µ Service 1” vs “µ Service 1b” vs “µ Service 1c”; if requests are submitted to “Legacy Service 1” they will be redirected to the appropriate microservice

Milestone 4

M4-The legacy service is fully decommissioned

The is the most important milestone. The fifth and final milestone is about decommissioning strangled services.

Image Highlights

  • Legacy Service 1” has been deprecated
  • DataSync will ensure the data from Cassandra is synchronized with SQL Server, for downstream consumers, like data warehouse
  • It is also possible to redirect downstream consumers to a new feed of data to replace their existing process
  • The content router is no longer required, as services will call microservices directly, and attempts to call “Legacy Service 1” will fail

Gotchas

  • The microservice’s domain model has to contain necessary legacy domain attributes in order to sync bi-directionally
  • Legacy “NOT NULL” columns will need to exist in the new microservice, or have a default value
  • Clients will have to tolerate and plan for eventual consistency

Conclusion

Moving to the cloud can be difficult and having a strategic plan is critical. The key element to our strategy at HomeAway is leveraging CDC for asset capture and event redirection.

One of the biggest challenges companies will face is unwinding their dependencies. Based on my experience, dependencies are typically what prolong cloud initiatives. Unlocking data at rest provides a means to move services and data to the cloud ahead of their dependencies and opens the door for strangler pattern.

I believe there are 5 milestones to the strangler pattern:

  1. M0 — Starting point
  2. M1 — Reads in the cloud
  3. M2 — Reads & writes in the cloud (multi-master)
  4. M3 — Strangle the legacy service
  5. M4 — Deprecate the legacy service

Note: Each of these phases will have a router or logic that determines which service (legacy or new) to route to for specific functionality

I believe in learning from and leveraging the thoughts others have shared in the software community. I hope my sharing HomeAway’s cloud migration strategy gives back to that community and encourages dialog about alternative strategies.