How the Strangler Fig Pattern and AWS DMS Helped EDF Modernise a Prepayment Meter Infrastructure Provision System

Published in

EDF Data and Tech

8 min readJul 23, 2024

Ancient face being grown over by a strangler fig plant — Image by Dominic Trier from Pexels

Introduction

Migrating legacy systems to the cloud is a critical step for many organisations aiming to modernise their infrastructure and improve operational efficiency. This article documents our journey modernising a 10 year old system, moving it from private infrastructure to cloud native, serverless, event driven architecture in AWS.

The lessons learnt along the way are analogous to any medium to large
enterprise that handles receiving, routing, storing and processing daily customer data. We aim to share our experiences and strategies, providing valuable guidance for anyone looking to modernise similar systems in the cloud. The main focus of this article is on the incremental approach utilised and AWS Database Migration Service (DMS).

About the system being modernised

The system is known by the energy industry as a “Prepayment Meter Infrastructure Provision” (PPMIP). PPMIP’s play a crucial role in facilitating energy supplier transaction allocation, from infrastructure utilising physical tokens or cards. Public electricity suppliers are obligated to provide facilities for allocating transactions to suppliers. Companies can source PPMIP services from other providers instead of relying solely on their own process. This article aims to describe how EDF modernised it’s PPMIP system and moved it to the cloud.

Modernising principles we followed

When performing an assessment of the potential for application modernisation, our key principles are:

Use the AWS 7R’s: Balancing what to retire, retain, re-host, re-platform, refactor/re-architect, or re-purchase.
Review operating model and skills: Ensure the ‘new’ version of the system is more supportable — better instrumentation, fault tolerance, team skills aligned, etc.
Serverless first and cloud native: Utilise cloud native services for as much as possibly viable.
DevOps and SRE: High levels of automation, CI/CD, observability, and everything-as-code.
Decompose the application: Assess the system for re-useable, loosely coupled architecture rather than rebuilding as-is.
Minimise user impact: Keep interfaces consistent, allowing for phased delivery and parallel running, avoiding “big bang” change.

Known challenges

Moving the data: Migrating large volumes of data and ensuring data integrity, minimal downtime, and seamless transition were critical factors.
Security compliance: The modernisation had to address vulnerabilities and comply with stringent security standards.
Parallel running: To avoid a “big bang” delivery, planning for
incremental delivery would require running both systems in parallel for a time. Handling “live” data synchronization and third-party interactions required continuous collaboration and coordination.
Documentation: The system documentation was extensive but old, which meant knowledge gaps that could only be overcome through investigation.
Stakeholder Management: Continuous open, honest, and inclusive
communication with all stakeholders was crucial throughout the process to maintain support and confidence.
Validation: Testing a feature is operating correctly before replacing something in a live environment was a “must have”.
Agility: Unforeseen issues arose, our team had to remain adaptable and flexible to new learnings, adjusting strategy along the way.

Choosing the Strangler Fig Pattern

The Strangler Fig Pattern is a concept with “roots” in the plant species of the same name. It slowly grows down and around a large shrub or tree, using it for support. Over time the fig grows roots into the ground and supports itself. It can become so large that it starts to kill off the host.

This sounds quite aggressive and parasitic, but the concept is a great one when considering how to replace an existing system with a different one. The idea is to leave the old system untouched whilst building new functionality around it, giving the opportunity to integrate new and old in a symbiotic way. Over time, functionality is replaced until the new system is all that is left, and naturally becomes “live”, incrementally decommissioning the old system.

We wanted to operate in a way that would deliver results fast. However, we also wanted to deliver change that was not disruptive to end users. EDF had used the Strangler Fig Pattern before, it was a well understood approach and suited our needs well.

It helped us focus on delivering subsets of the new system in the smallest increments viable to business, balancing early wins with areas that needed more time, while continually running both systems in parallel in production.

Why does it matter?

An alternative considered was to build the new system in it’s entirety and then plan a “go live” day where we turned off the old and turn on the new. Often called a “big bang” approach. This is very common, but has 2 critical challenges that were considered unacceptable:

High Risk: Replacing an entire system all at once, carries significant risks, with potential for prolonged downtime, complex rollbacks, and higher chance of problems post release.
Delayed Value Delivery: Stakeholders have to wait until the entire system modernisation is complete to see any benefits. This often takes longer than estimated.

We could have followed a hybrid approach, but this just meant smaller versions of the same challenges and potential interoperability issues between systems.

Lifting the hood

This section aims to describe how we approached the modernisation technically, comparing the new and old systems and how we grouped delivery of changes.

Simplified high-level list of capabilities the system being modernised comprised:

File ingestion and processing
Database CRUD operations
Database reporting and distribution
Scheduling system
File share

Increments from old to new

The following is a breakdown, in order of delivery, for how subsets of capability were identified and isolated for incremental change, including a simplified comparison between technologies.

+========+======================+========================================+
| Subset | Old system           | New system                             |
+========+======================+========================================+
| 1      | SFTP batch jobs      | SFTP via Lambda + Transfer Family      |
+--------+----------------------+----------------------------------------+
| 2      | Server logs/alerts   | Dynatrace instrumentation and alerting |
+--------+----------------------+----------------------------------------+
| 3      | Oracle DB            | Aurora ServerlessV2 + Postgres + DMS   |
+--------+----------------------+----------------------------------------+
| 4      | Java batch jobs      | Java Docker + ECS tasks                |
+--------+----------------------+----------------------------------------+
| 5      | Java                 | Java LTS update                        |
+--------+----------------------+----------------------------------------+
| 6      | Java email           | Lambda email + S3 signed URLs + SES    |
+--------+----------------------+----------------------------------------+
| 7      | Java batch tasks     | Java "read-only" ECS tasks             |
+--------+----------------------+----------------------------------------+
| 8      | NFS file share       | S3 File Gateway NFS share              |
+--------+----------------------+----------------------------------------+
| 9      | Retired Java outputs | Lambdas for new features               |
+--------+----------------------+----------------------------------------+
| 10     | Java batch tasks     | Remaining Java ECS tasks + disable DMS |
+--------+----------------------+----------------------------------------+
| 11     | Run in "silent mode" | Fully live                             |
+--------+----------------------+----------------------------------------+
| 12     | Fully decomissioned  | Fully independant                      |
+--------+----------------------+----------------------------------------+

AWS Database Migration Service (DMS)

Parallel running offered an easy way to validate both systems side by side by maintaining a source of truth in the data coming from the old system. However, this presented a challenge: how to sync data from the on premises Oracle instance to the cloud Aurora PostgreSQL cluster, and allow for parallel destructive writes to the database.

Data integrity with DMS

Using the AWS DMS we were able to quickly achieve the following:

Schema conversion from Oracle to PostgreSQL including Java code containing SQL.

Primary data migration from Oracle to PostgreSQL capturing data changes in real time.

Secondary periodic full data load from Oracle to PostgreSQL, thereby capturing data changes by destructive writes using the same file inputs as the old system from migrated Java tasks.

AWS DMS: https://docs.aws.amazon.com/dms/latest/userguide/Welcome.html — AWS DMS

This gave us an environment that could perform full CRUD operations in parallel to maintaining a single source of truth from the old system data. Having a single source of truth provided a way to validate changes before going live.

We plan to do an extended article on how EDF used AWS DMS and SCT.

The new system

The composition achieved by the end of the modernisation process is a balance between de-coupling what we could and containerising things we did not want to change. Observability was built in from day 1 by instrumenting Dyantrace up-front. Where possible we switched from time based triggers to pure event based triggers.

The only link back into private infrastructure was to replicate a network file share for the users, where we used the AWS File Gateway Service. Which involved installing a device on premises and syncing back to S3 periodically. This has now become a new pattern utilised by EDF.

Here is a simplified view of the key functional components for the new system:

Final architecture — basic functional composition

Final thoughts and key takeaways

Modernising EDF’s PPMIP system into the cloud was a complex but rewarding endeavour. By sharing our journey and insights, we hope to provide valuable guidance for other organisations undertaking similar projects.

Here are the top learnings:

The bad

Java LTS update: Updating Java (or any language) will take longer than anticipated. For us it created anomalies in the data which took time to investigate. An alternative would be to ring fence any “legacy” code, protecting it from vulnerabilities. We chose supportability over speed here, which was a luxury not every delivery has.
Third party thresholds and parallel running: By running in parallel we doubled the load on some third party data providers who had agreed upon thresholds. We worked around this by reaching out to the providers, but it slowed us down. Seek out these limitations as early as possible, or they will have negative consequences.

The good

AWS Database Migration Service: This was a key enabler for us parallel running. The challenges are connectivity and schema nuances between different DB technologies. Once setup, the service is solid and can automatically validate the data.
The Strangler Fig Pattern: This pattern (or mindset), allowed us to plan delivery of value to stakeholders early, minimising disruption/risk and was simple to validate. It made the final switch over day a “non-event”, with little left to do other than start tearing down the old system.
Decompose and assess: By assessing each capability against the 7R’s, we found a happy balance between modernisation (e.g. Serverless DB) and minimising change (e.g. Java/Docker). It can look like an easy win to migrate “as-is” to the cloud, but this can just shift problems, and potentially cost more in running costs over time.