cybernetic meadows: how a bot helps engineers at the FT

Jon Nangle
Oct 15 · 6 min read

One of the great things about working in Technology at the Financial Times is that engineers are empowered. Our technology teams are given a lot of leeway to decide for themselves how they work, the tools they use, and to some extent at least, the projects that they work on.

The group that I work in is called Engineering Enablement, and a big focus for us is trying to make life easier for other engineers in FT Technology. We don’t want people to spend a second longer than they have to (or want to!) in dealing with the low-level, nitty-gritty detail of getting their products up and running. Our job is to find the bumps in the road, then come along with our metaphorical road-roller to smooth them over.

A road roller crushing empty cans at the indietracks festival.
Not the metaphorical kind: the famous can crusher at indietracks festival — much missed during the pandemic and hopefully returning some day soon.

The things we do

One of the more important services that we look after is DNS. Once upon a time, engineers at the FT had to get into the weeds of DNS on a regular basis. This was not very easy to do. People would have to figure out how to get access to our DNS provider, learn how to use the user interface or APIs, and then figure out what to do if something went wrong.

That all changed a couple of years ago when we switched our DNS provider to Amazon Route 53. At that point, we took the opportunity to wire up our DNS records to a Github repository. (We did this with the help of Github’s amazing octodns toolkit.) Now, FT engineers can make changes to our DNS records using tooling and workflows that they already know how to use — they raise a pull request on Github.

Once the PR gets approved, it goes into a CI pipeline, we run some tests, magic happens, and the DNS changes go live on Route 53 a couple of minutes later. We even make a record of the change for them via our fantastic Change API, so that everybody knows what’s happened.

And when it went live, our engineers seemed to like it:

Lovely! Job done. Well, almost.

Tired of waiting for you

Recently, we ran a survey to ask our engineers what they liked about our engineering process, what could be better, and what was a massive pain. We got a wide variety of responses, but one of the common themes was:

“I really don’t like having to wait for approvals to happen when I make DNS changes”

And who can blame them? If you’re sat waiting for a DNS change to go through before you can make progress with your next piece of work, even a few minutes can be frustrating.

We decided as a team to see if we could make this a bit better. We used the Github API to analyze the last 500 pull requests that were raised in our DNS Github repository — this covered about six months’ worth of changes. We looked at the types of DNS changes that were being made, and we examined the approval workflow for each change to see if there were any common patterns.

And there were! We found that a majority of pull requests got approved with no discussion — none at all. We just hit the Approve button. They were straightforward enough that there wasn’t very much to discuss — they were standard DNS additions, or modifications to existing records where the intent was clear.

Increasing the time-to-live on a DNS record: not a controversial topic

In other words, business as usual. Wouldn’t it be nice if we could have a robot to approve those types of changes, so that engineers didn’t have to wait? We thought it would. So we wrote one!

Robot rock

Enter dns-approval-bot. This bot consists of a set of rules and actions, all wrapped up behind an API. We call the API from our DNS CI pipeline, and it runs a set of rules against the PR, using the Github REST API. These rules look at the proposed changes in the PR, and then they ask questions such as:

  • is this PR adding a new DNS record, of a straightforward type, that we’ve seen lots of times before?
  • is it making a change to one of our critical DNS records?
  • is it carrying out some simple chore, such as adjusting the time-to-live on an existing DNS record, or doing a straight swap from one cloud service endpoint to another?

Based on the answers to those questions, it carries out one or more actions. These can be things like:

  • approving the PR, with no further ado
  • adding a comment to the PR with a helpful message about what to do next
  • adding an FT staff member or technology team to the reviewer list for the PR, so that they can take a look at it and hit the approve button if they’re happy with the changes

Here’s dns-approval-bot doing its thing on a real PR:

Here, it has noticed that we’re making a DNS change that relates to emails, so it’s added the FT’s Cyber Security team as a reviewer on the pull request.

It’s important to note that even though this PR isn’t being auto-approved, dns-approval-bot is still playing an important role. In the past, we’d have been reliant on engineers to remember to obtain the correct approvals from our Cyber Security team for this type of change. Now we don’t have to worry — if Cyber Security needs to see a DNS change, the system will make sure that it happens.

In the future

We’ve started small with just a few rules and actions, but we have plans to add many more. We think that there’s lots of scope for using other FT APIs and information sources to help dns-approval-bot to make decisions. For example, we could examine logs in Splunk to see if a particular DNS record is heavily used — if it is, then that’s a good sign that it’s probably more important than most, and that we should tread carefully before approving any modifications or deletions that might affect it.

We are also lucky enough to have an amazing knowledge database called biz-ops at the FT, which stores information about our systems, products, teams, business capabilities and much more besides. There are lots of ways that we could use this information — for example, we could find out the teams that might be interested in being informed about a particular DNS change, or we might want to obtain a list of the systems that could potentially be affected.

We’ll keep plugging away at it! For now, we hope that these incremental improvements will help to make our teams that little bit more self-sufficient and give back some of the time that they spent on chasing up approvals, so that it can be used for more useful things.

FT Product & Technology

A blog by the Financial Times Product & Technology…