BBC iPlayer & BBC Sounds — Availability and Transparency, our journey so far

David Andrade
BBC Product & Technology
5 min readAug 28, 2019

It was early December 2018, like most teams operating services that have a busy holiday period, we started discussing what we needed to do to ensure a great experience for both BBC iPlayer and BBC Sounds users.

Arguably it was too late to have the conversation but overall, our teams were well prepared and the holiday period was fairly smooth, users were able to enjoy the services without outages or significant issues.

Whilst the conversation highlighted how much our teams care about their own services, it also highlighted that as a bigger team, working across multiple sites and developing our products in multiple platforms, we could be better prepared overall.

After a few discussions, we decided to focus on two things. First, ensuring a great service during this period. Second, making sure that we would evolve our approach, our processes and practices, not just on a team basis but at a product level. All of these have to be just as world-class as our products themselves.

Fast forward to January and the easy thing would have been to forget about it and have a very similar conversation in November or December 2019. We didn’t do that. Instead, we decided to start straight away what we initially called “Christmas 2019 Operational Readiness” and ended up evolving into “99.95% Availability and Operational Transparency”. This was the start of what is now a weekly, sometimes daily conversation. Just today, while walking past, I saw two people in our team actively discussing what we need to do to achieve 99.95% availability for specific services and what that means for the overall product target. It is worth noting that a particularly interesting point has been defining availability itself which looks a lot simpler than it is.

But for us, the goals are clear; first to make sure that both iPlayer and Sounds are as available as possible to users (the 99.95% goal), and second that we are as transparent with the audience (our users) as we can be without exposing sensitive information (the operational transparency goal).

Why did we end up here? As our thinking evolved, we went back to our DNA, the BBC has always provided a world-class service to the audience, our broadcast services are the best in world and used by millions every day. We believe that our mission to “inform, educate and entertain” can only be achieved these days with digital services that are at least as good as the best there is.

Executing on this, however, is a lot more complex than it may seem. Software products that serve millions of users every day are usually complex in their nature, have lots of dependencies and teams are continually making changes. All of that makes driving overall improvements very tricky. We haven’t shied away from it though and decided to split our efforts into three main areas:

  • Understand our current position: what does availability really mean for each of our products, how can we accurately measure it, report and ultimately improve on it? How can we be transparent with our users and tell them what is happening at any given time?
  • Standardise tooling where appropriate: how do we ensure that teams have the autonomy to continue monitoring and improving their services whilst getting more alignment and standardising where it makes sense? All of this across multiple platforms like web, TV, mobile and our backend services.
  • Improve our practices: what practices do we already have in place and which ones are we missing? How do we improve where we are not up to the level we want to be?

But how did we actually start? As simple as it gets really, a shared document where we asked the team to contribute. The first step was to understand what we were already doing well and which areas we needed improve. A lot of people added their ideas and it ended up growing into a few different documents linked to each other.

Once we felt that we had a more structured story to tell (kind of!), we’ve started doing small presentations on this to raise awareness with the wider team. That’s when more questions came, making us think harder about what we’ve set ourselves up to and how we will get there…

Now we are more than halfway through the year and there are some key things that we are improving:

  • Started measuring and reporting on our uptime/downtime to share with the teams in a way they can easily understand and relate to. Teams were already aware when their services had issues but not necessarily when others did. Shared awareness is a good thing if you are part of a big team and your application has a lot of dependencies
  • Asked everyone which tools they really use for monitoring and alerting, and going through a fresh requirements gathering exercise and a discussion about our ideal operating model (that autonomy vs alignment thing). Over time, these things grow organically so reviewing it is proving an insightful exercise.
  • Shared with other parts of the business what we are doing to start aligning our efforts and better prepare ourselves — when is the next big marketing campaign or editorial release that is happening?
  • We have also shared this with the other departments and all our documents are open to everyone in the BBC, making sure that other people are in the loop and can share their own ideas
  • Created new forums for discussion and awareness like monthly operational reviews. We want teams to learn their own lessons but in subjects like this, it is even better that we learn from each other by sharing what happened, lessons learned, etc (no-blame culture!)
  • We are also exploring what’s the easiest way to tell our users how our products are performing in a way that is easy to understand. Not necessarily a straightforward task but something that clearly speaks to our transparency values
  • This is now also part of our core strategy for both products, making it a cross-discipline effort and something that everyone can work together on

The truth is, the more we discuss this, the more it feels like this is only the start of the journey. There aren’t right answers or a playbook that we can follow. Yes, of course, we can build on our teams' expertise and look at how other companies do it, but the more we invest time and effort on making our products better, the more we realise that this is our own journey. Like any other organisation, we have our own challenges to overcome in this space and given the progress so far, I have no doubt that the quality of service we provide will only continue to improve.

To be continued…

P.S.: You can help out on this journey, join us at bbc.in/hiring

--

--