Grow now, pay later? Clearing “tech debt” for a faster FT.com
I lead a fast-moving, agile team of about 50 developers & 5 designers at FT.com, most split into 10 delivery teams, each with a product owner.
The FT.com website has been growing steadily over the past year, getting bigger and bigger, growing in features and complexity.
We’ve tried different approaches to help us tame this growing beast.
I’ll explain four different approaches we’ve tried, using the example of website performance.
Here’s a chart:
When a first-time user visits the front page with many of the new features included, it takes about 8 seconds to fully load (in tech jargon: note how the CPU is maxed out at 100% for 8 seconds). In other words: it’s incredibly slow!
If we exclude some of the features, it speeds up again:
Woo-hoo! It now loads in a snappy 1.4 seconds instead of 8.
So does it have to be either/or? Is it possible to keep growing and improving the site without sacrificing speed?
How do we get the team to focus on and improve shared concerns like this?
1. Make performance the responsibility of individual delivery teams
One approach we’ve used is to ask each team to look after performance in their individual delivery areas. There are tools to help with that, such as flags turning things on and off, monitoring using Speedcurve, setting up Slack alerts, automated testing & rejecting new releases that degrade performance.
The team could use their deep understanding of their individual domain and the needs of their stakeholders to arrive at performant solutions.
A downside of this approach is that it hinders the agile process, where we try out ideas in the quickest & simplest way, rather than in more elaborate ways that need more time to perfect. It may actually be quicker to use a slow third-party library with many bells & whistles we can’t turn off, than to “roll our own”.
Our experience has shown that it doesn’t tend to work well when performance issues are spread across several teams. There can be timing conflicts, e.g. performance issues with a component from another team who, due to a large backlog, can only address the problem months later. Budgeting can also be tricky; how do you decide how to distribute the performance “budget” fairly (tech: time with the CPU) at the team level, rather than at the FT.com level?
2. Set up a dedicated performance team
We also tried setting up a team focused solely on performance. This team looked at the site and all the features as a whole, and zeroed in on performance bottlenecks, cutting across multiple features.
This team including the product owner only needed to worry about one thing: achieving performance improvements. But they had less understanding about the technical, design or business decisions behind each feature, e.g. why does the dynamic thing need to be at the top?
Other teams across the organisation often just didn’t have the bandwidth to work with the performance team, e.g. to change analytics tools used by marketing.
So we tried something different:
3. Run a tech debt month (or quarter) every year
We agreed to set aside a dedicated chunk of time where everyone would put new ideas/features on hold, and concentrate on clearing the website’s accumulated “tech debt”, thereby significantly improving its performance.
One advantage to this dedicated approach is that it creates the kind of environment where everyone is more receptive to “killing” features, in the understanding that it’s all for the greater good.
Product owners also have the space & support to make technological changes possible, e.g. negotiate with advertisers to introduce a faster format for advertisements, find an alternative combination of personalisation that is faster & won’t impact on revenue, etc.
There are those who thrive in clearing tech debt, and those who don’t (preferring to stick to routines, comfort zones, business-as-usual, “you can’t teach an old dog new tricks”, yada yada). To break out of this rut requires careful planning, sensitive handling & smart, inclusive leadership.
4. Bug duty rota
We’ve tried to replicate a dedicated team (#2) and tech debt (#3) in the form of a bug duty rota, a weekly rotation of two dedicated developers who could work on the various issues.
It became a popular place for customer & editorial support teams to visit, allowing delivery teams to focus on developing new ideas.
Despite its popularity, as FT.com grows in features & complexity, bug duty developers can only do performance fixes that are simple and quick — and they rarely are.
What works best?
What would you choose? How about a combination of two or all of the four approaches?
As for me, I really like the speed and agility of delivery teams where anyone, no matter what level of skill, can develop & test ideas in production. Tidying up performance can be left for later, when you have worked out what ideas will stay and what ideas will go.
Useful groundwork prior to clearing tech debt can include analysing performance issues and identifying the longer-term technical, design & business goals. Time & space must then be found to kill unwieldy, non-essential features, and achieving performance across the whole website.
Paying off tech debt is a valuable investment, resulting in a substantially better performance experience & building mutual understanding and support between technical, designer and product teams — which makes us all better off.