Why and how we changed the way we support FT.com

Tatiana Stantonian
FT Product & Technology
4 min readMay 13, 2020

A smaller and more consistent team that focuses on triaging, urgent issues, and recurring pain points.

Every day, FT.com publishes over 100 articles and about 25 developers work on maintaining and improving the site. One of our teams is dedicated to dealing with bugs and incidents reported for FT.com, and is nicknamed “Ops Cops”.

Product teams used to take on the Ops Cops role for a week at a time

Until last August, Ops Cops didn’t have any permanent members and its duties were handled in turn by the various product teams within FT.com, who would interrupt their usual mission work for a week to focus on incoming bugs and incidents. The developers would be in charge of responding to any reports, as well as tracking and resolving them. Every week, there would be a handover meeting to discuss any unfinished tickets with the next product team that was about to take on the role of Ops Cops. That system had the advantage of being quite easy to manage, with each team taking on this role every 5 or 6 weeks.

However, it also had drawbacks:

  • Bugs would go to Ops Cops regardless of which FT.com app they belonged to. That means that the developers that were familiar with the bits of the codebase where the bug occurred and had recently changed them were not usually the ones to fix them. The impact of that was that teams were not actively aware of the technical debt they were creating.
  • Developers also spent time doing admin work such as creating Trello cards, responding to people reporting the incident, and updating them on the status of their issue, creating interruptions and distractions from the technical work required to solve them.
  • There was a lack of consistency and attention to recurring issues, due to the fact that no one spent more than a week on Ops Cops at a time.
  • During shifts there was a disconnect within product teams as Product Managers and Delivery Managers were not involved in Ops Cops work, but their developers were not available to progress product work either.
  • And finally, team sizes were uneven, which means some weeks three engineers would be on Ops Cops and others up to seven, and there wasn’t always enough work for that many people.

We created two permanent OpsCops positions, assisted by engineers rotating in

Last summer, we decided to change how Ops Cops works in two important ways:

  • Ops Cops now has a dedicated Delivery Manager and Tech Lead and two engineers on rotation, coming from different product teams.
  • Ops Cops started delegating non-urgent issues to product teams. This was a two step process: initially only bugs on applications that product teams were actively working on or familiar with were delegated, and Ops Cops dealt with any bugs happening on ‘unowned’ apps. Now, we are moving towards there being no ‘unowned apps’; all apps will be owned by a product team, so all non-urgent bugs are progressively being delegated to product teams too.

Bugs are tackled more efficiently and pain points are addressed

Because product teams deal with the defects that occur on their own applications, they have context to solve them more quickly than Ops Cops would, and the learnings derived can inform their future changes. And although that does add to their workload, they also have more people to tackle it as they don’t need to dedicate all their developers (up to seven people) to Ops Cops for a full week at a time every 5 or 6 weeks. In the new system, devs from product teams only spend about one week per quarter on Ops Cops, where it used to be twice as much.

Having permanent members has also helped share context from week to week with developers on rotation, prioritise issues consistently, and set OKRs to address pain points and recurring bugs. Since moving to this system, OpsCops has spent time tackling constantly red healthchecks and the noisiest alerts. We’ve built a dashboard allowing us to see immediately if there was an error spike. We’ve also built a tool to allow the product support team to self-serve epaper subscription cancellations, which involved a very small task that developers had to manually do, with requests coming in almost daily.

Image of a monitoring screen in FT’s offices
Ops Cops’ monitoring screen. Remember offices?

Devs can also focus on their technical work and avoid needless interruptions as the delivery manager is the first port of call for reporters of issues — they acknowledge the issue has been recorded, and create a JIRA card to track progress. All developers have to do is to pick up cards from the JIRA board and document their findings.

We’ve had very good feedback from the developers on OpsCops rotation, who have overwhelmingly said they prefer this system. The consistency brought by having a permanent tech lead and delivery manager has also been appreciated in the wider business, such as the operations and reliability team and various other support teams, as they now have a ‘go to’ person that represents Ops Cops for FT.com.

Next step: delegating all low impact bugs to product teams

At the beginning of July, Ops Cops will be moving to a system where all low impact bugs will be tackled by a product team. This means a Product Manager, who is familiar with the users of said product, will be able to prioritise these issues alongside their mission work. They can also prioritise the work to reduce tech debt on applications that have a lot of issues.

Ops Cops will remain the first port of call for all FT.com support requests and will be responsible for assessing the impact of issues and delegating them to the relevant product team if it’s a low impact bug. We will be immediately available to tackle major and high impact issues such as outages or serious usability issues, which cannot wait.

--

--