Traps that could derail your replatforming project (and how to avoid them)
We recently replatformed at MPB. I couldn’t help thinking back to similar projects I’ve been involved with in the past.
All were unique projects run by very different organisations, each a valuable learning experience (including this time around). But some challenges seem to crop up so regularly that they clearly form a pattern.
Perhaps that’s why tech troubleshooter Rich Mironov says most of the replatforming efforts he’s seen have failed. His post makes excruciating reading for anyone who has ever attempted such a project.
At MPB, our technology is going through a transformation — from monolith to SOA; from our early-adopter prototype Cloud platform, to modern-state Terraform deployed to GCP; and with a major-version code upgrade and 100% automated regression test coverage for key user journeys as part of our new CI/CD pipelines.
Sounds simple enough? Consider that we had to change the wheels while the vehicle was moving — that is, to deliver it without service outage or loss of customers’ data.
So for anyone looking to launch this kind of project in 2023, may I humbly present your replatforming sat nav — roadblocks to watch out for, and suggestions for avoiding them.
1. The right way to handle tech debt
You need a plan for your legacy code. Anyone replatforming today is probably faced with monolithic applications — closely-coupled code full of complicated inter-dependencies, where anything you change might break something somewhere else.
If you’re running a small, simple website, where you understand the details of all the business processes it supports, you might get away with discarding the lot and starting again. More likely you’ll need to keep some parts that interface with other parts of your business.
The right way to deal with this is encapsulation within the legacy codebase — the ‘Strangler Fig’ design pattern. Unpick the existing tangle first, separate it out into self-contained modules, and then lift and shift into your new architecture afterwards. This lowers risk and improves your workflow. It also allows you to get some early quick wins to improve confidence.
2. Scope, part one
When your project is announced, everyone in the organisation will have requests. At this stage, you must be very clear on what is in scope, and—more so—on what isn’t. Otherwise, you risk facing the fallacy of the open trench.
This is the idea that when the water company digs a trench in your street to fix the pipes, other utilities might as well take the opportunity to update the telecoms, electricity, etc.
Sadly, life isn’t a 1980s beer advert. In reality, the outcome is increased complexity, time and risk. Your trench will stay open for months and people will be even more annoyed.
If you’re working on a performance upgrade, don’t simultaneously change your business rules, content and look-and-feel. Instead, split them out into isolated pieces of work. Today’s CI/CD-based service-oriented architectures allow you to update little and often so that subsequent changes can be implemented more quickly and efficiently.
3. Take control of your guesstimates
A paradox if ever there was one. The more accurate your estimate of delivery dates, the more you have to delay them.
When your local builder estimates three weeks to refit your kitchen, they probably haven’t spent weeks poring over spreadsheets to audit material costs, staff availability, supply lines and costs, legacy fitments, cash flow, conflicts with other projects, etc. The refit will take far longer than this, obviously, but you’ll probably get what you wanted (eventually) at an agreed price that works for both of you.
Equally, when a big developer quotes to build a housing estate, I’d expect a specialist team to have planned it, in exacting detail, over weeks or months. At that scale, uncontrolled costs could quickly undermine a project’s viability.
So, which approach does your organisation need? For major software builds, this isn’t as easy to answer as it looks.
The complexity is enormous and the only people really qualified to give estimates are the same people who will be building the product. The better the estimate, the longer it takes to arrive at. They can’t do both at once.
Meanwhile, your business leaders and sales teams have to talk confidently to investors, stakeholders, suppliers or customers. You have to tell them something, but what?
A lot depends on the size of your operation (and scope), of course, in software as in building.
So first, a radical thought: if you can, limit estimates to the next quarter and abandon long-term forecasts altogether. It’s a brave CEO who will understand that, and it’s an uncomfortable place to be for just about everyone. But with a very complex project, early forecasts are often a waste of time and set you up to fail or cut vital corners. Accurate shorter-range estimates will at least build stakeholders’ confidence in the long run.
Second: create a psychologically safe space for your team to estimate. If a peer, not a boss, asks a developer for a realistic estimate, there’s more chance they’ll answer openly without fear of being judged.
4. Scope, part two
At MPB our priority was to deliver the new platform as quickly as we could, so our team could move on to developing new features for our customers and colleagues, and also remove the cognitive load of maintaining two versions of our codebase in parallel. But we also realised the risk of a big bang delivery.
As the project progressed, we challenged ourselves to dig deep and decouple whatever we could.
For example, the scope initially included the scanning and tracking system for our warehouse systems. We decided to deliver this ahead of the main delivery.
Although this added some additional work in retrofitting the scanners to our legacy platform, it gave us a great bonus: our teams in the warehouses had vital time to bed in the new processes and tooling. This really reduced complexity and risk at system cutover (and gave them some early benefit too).
I think the wider point here is don’t be afraid to regularly challenge your approach.
5. How much testing?
Today’s complex replatforming projects are beyond the limits of manual testing. Automation is imperative, especially when it comes to vital regression testing.
At MPB we invested early in retraining our manual testers, deploying modern automation tools and creating a battery of scripts to run overnight, every night as we developed the new codebase. Tools such as Selenium simplify this and were used in conjunction with BrowserStack to support cross-browser testing.
Performance testing is eminently automatable too, and lets you fine-tune as you go along and right-size your infrastructure for planned business growth.
Finally, we built in advanced monitoring and alerting via Prometheus and Elastic, which takes a service-level objective approach rather than monitoring individual parts of the codebase — measuring whether key user journeys can be successfully completed end to end.
Are bugs a thing of the past? Clearly not. Automation can’t do everything and the perfect is the never-shipped. At least today’s continuous deployment systems mean things never have to be wrong for long.
You do need your organisation’s leaders to be clear and open about their attitude to risk. If you can talk in terms of specific scenarios and their impact, you’ll get enough of an idea to plan the right level of testing. You can also consider whether to accelerate delivery by delivering low-traffic features in beta.
6. Sense of history
Nine times out of ten, the best solution for transactional continuity is … not to have any.
Give your customers plenty of notice, let them download their transactional data and email them again when they need to set up a new account in the new system.
Your mailing list database will get a useful update and you haven’t wasted anguished hours porting a complex web of structured information into a new environment. Hooray!
It doesn’t work so well if you’re a bank, of course.
At MPB we, too, wanted to avoid the disruption to our customers’ experience. We opted to maintain transactional continuity.
Because we were changing our base functionality, we’d have to not only move the data but restructure it too, all without significant service downtime.
That isn’t a common pattern for an ecommerce (or recommerce) platform. It’s complex, risky and near-impossible to reverse once started. It was a huge overhead, and our migration script needed severe tuning, but we did achieve the continuity we were looking for, which was a great win for our customers and our internal data analysts and business planners.
7. What if …?
Shortly before go-live, having drawn up detailed launch plans and ensured universal familiarity, it’s time for a ‘What if?’ meeting. This is where you develop step-by-step solutions for as many eventualities as you can imagine, from ‘what if someone’s ill?’ to ‘what if we can’t book parcel couriers?’.
I find the best time to do this is a day or two before launch, over tea and biscuits, in a relaxing space. It’s much harder to adapt to unexpected hitches when you’re live and people are stressed — lizard brains rarely produce optimal solutions but have no problem following procedures.
This meeting should result in better preparation and confidence overall, though it does have the net side-effect of making everyone individually slightly more worried (because they’ve added everyone else’s fears to their own).
8. Right place, right time
People are more likely to have confidence in a delivery if they can immediately see you’re there for them. So draw up your rota for launch week, from early until late, so that the team works in shifts at the office, covering your main business hours in all key markets.
These shifts can be intense, so consider shortening them — you don’t want tired people working on urgent fixes at the end of a 12-hour day.
It can also be helpful to have a nominated senior person on the business side to act as a familiar point of contact for people in the wider operation.
9. Monitor morale
We all have different motivations and blockers, but most of us will sooner or later become disaffected by an unsustainable pace of working.
It’s possible to pull long hours for a while. Sometimes it’s unavoidable, even enjoyable.
But if that becomes the norm it starts to harm productivity. Stressed people make bad decisions and forget. Of course, that’s a problem that will solve itself eventually to the detriment of the business: people will leave, starting with the most employable.
It’s not a big shift to set clear expectations, give people agency, notice their efforts and avoid overpromising on their behalf. Commit to work at an infinitely sustainable pace from the outset and your team will go the distance.
Sometimes it’s hard to see the light at the end of the tunnel mid-project. So, to finish, here’s a list of positive things that came out of the MPB replatforming:
- We can focus on which features are the priority for the business, rather than being driven by fear of platform instability
- Descoped items didn’t go away, but are now much much easier to deliver on our new pipelines and detangled codebase
- Multiple daily deployments, tested but easy to roll back in case of issues
- Everyone working on the same codebase, organised around capabilities not projects, with Communities of Technical Practice to align on cross-cutting concerns.
- Greatly improved performance, scalability and uptime
The customer benefits of the new platform are an outstanding story … but that’s another blog post.
Sophie Davies-Patrick is Chief Technology Officer at MPB, the world’s largest platform to buy, sell and trade used photography and videography kit. https://www.mpb.com