BBC Online — A year with serverless

Its been a little over a year since I published my last two blog posts, in which I outlined the process we went through to choose the technology for BBC online and the steps we took to optimise serverless for our use. Recently my colleague Graeme has published a blog post on the organisational challenges we’ve faced in delivering this project.

The BBC’s Office in MediaCity, Salford — one of many bases that the BBC’s Software Engineering teams are located.

In this post I reflect on our progress so far, and some of the interesting challenges we are facing while building the BBC’s critical digital services. If you’re a technology builder, interested in the biggest of challenges, we’re always looking for people like you to join our adventure.

Throughout the last twelve months our engineering teams have been reimagining and innovating on our products. In doing so they have moved from our older technology platforms on to our new shared serverless platform called ‘WebCore’. We’re about 30% of our way through moving to this new platform. Despite this, the benefits of using a serverless platform are starting to show.

Pages on WebCore (left), pages under development (middle) and pages we’re going to start development on (right). Block size represents number of page views across a typical month.

During this time we’ve not had any operational incidents (lack of audience service, or inability to update our pages) caused by the underlying technology. This isn’t by luck, serverless has handled many of the performance and operational challenges for us, letting us focus on other parts of our platform resulting in a higher quality platform that is more dependable

Web Core highlights for February 2022, including 2.3 million requests served, 1,500 concurrent serverless function executions, 3.3 billion serverless functions invoked and 22,000 platform builds to pre-production.
WebCore highlights for February 2022.

Using serverless takes away the need to spend time engineering the fundamentals of your platform, instead allowing you to focus on the value you deliver to your customers. So what have we been doing with that time?

Personalisation / Relevancy

The BBC achieves its high levels of reliability and performance by relying upon caching, both internally within our systems, at the edge of our networks and via CDNs. The result is fast loading pages that can be stored at the edge. Yet, in a world where relevant content is key, we need to adapt our services to deliver more tailored and personalised content to our audiences. This means more variation and more load.

We spent a number of months preparing the platform to handle personalised requests end-to-end across the stack. This has now launched for some pages. For example, if you’re signed in, are located within UK territory and visit the BBC homepage you’ll get a page customised for you.

The page takes around 500ms to render and be delivered to the audience. In that timeframe we invoke around 30 functions. Around 150ms is spent running React to render the content to HTML, the rest is spent fetching or processing data.

In February 2022, 145 million requests were personalised, representing 6% of WebCore’s traffic.

Our goal: over the next 12 months we aim to personalise almost every page in some way — making it relevant for every user on every request.

Experimentation

Delivering value and improving audience experience is at the heart of what we do, previous platforms that have relied heavily on caching for performance have meant we’ve been unable to perform experiments on our pages. A/B/n testing allows us to improve the audience experience by understanding what users want to interact with on a page.

Experiments means more variations of a page, consider if we ran an experiment with 6 variations on the global navigation for the website (the bar listing Home, News, Sport, Weather etc.). If we did this, we’d increase the number of pages we’d render by a factor of 6, instantly as soon as the experiment was activated.

The team that builds the global navigation is separate to the team building the news story pages or the news onwards journeys (right hand side bar on story pages). Imagine if all three teams started experimenting at the same time, there would be N * N * N more variations of the page! As builders of the platform running this, we have no control over who runs an experiment with how many variations. We just have to be ready!

We’re in our late stages of development with this piece of work — but when it is active, we’ll enable experimentation across the platform at any scale.

Our goal: experiment on every new feature we release, to ensure it’s valuable to our audience.

Speed of updates

During the working day, the BBC News and Sport websites have editorial changes around ~3 times per second. This includes changes to our news and sport articles, indexes, videos and short form posts (excluding non-editorial updates such as sport events scores).

For those pages that aren’t personalised (and so rendered for you), we want the content to be as fresh as possible. Previous platforms have relied on long (30 second+) caches times and multiple caches to handle the load and provide the resilience required.

Serverless has unlocked the ability for us to process more, reliably. This not only includes the execution environment where we transform our data and render our web pages, but also using serverless Redis that can scale to meet the demands of our workload.

A graph showing how the impact of our changes on editorial updates.

Throughout the last 8 months we’ve slowly reduced our cache times on the stack, such that editorial updates for content have decreased from 2 ½ minutes to 30 seconds.

We think we can improve this further!

Our goal: In the next 12 months, get editorial updates to under 5 seconds.

Build times

Enabling our engineering teams to deploy often and safely is key to our success. So we’ve taken the time to move away from our traditional CI/CD pipeline. Historically we were based in Jenkins on a small number of compute instances. Now we’re making use of serverless CI/CD pipelines to enable us to scale our build and deployment process based on the load and demand on the system.

Its been a change in mindset, rather than running sequential builds, why not run every build step and test as concurrently as possible.

Currently our build times are around 5 minutes, down from 24 minutes at the start of the project due to using serverless for our CI/CD process.

Our goal: Reduce our build times below two minutes!

Only a few minor issues

While our experience of using serverless has been excellent, there have been a few minor issues with the underlying technology along the way.

One of the primary issues we’ve faced, was that of noisy neighbours. Most serverless platforms are shared, in that the provider’s resources are allocated to multiple customers, with all data being securely sandboxed and isolated from one another.

This is optimal from the providers point of view as they can maximise the efficiency of their service. Keeping carbon footprints and costs down.

Our serverless execution is time sensitive, as a user asking for a web page will have to wait for the Lambda function to respond. So we keep a close eye on the response times. We started to notice an increase in our function duration on the hour and quarter hour.

After ruling out any internal issues, e.g. cache clean ups, delays in services providing data, increased traffic from our content systems. We determined this was an underlying platform issue.

A graph showing average duration of our presentation Lambda.

The graph shows increased latency on the hour and quarter hour, caused by the underlying serverless infrastructure.

Working with our service provider, we were able to identify the issue as being a noisy neighbour problem, caused by other customers having ‘on the hour’ workloads.

These improvements delivered by the serverless team benefit everyone using their serverless product.

Summing Up

As the BBC’s audience continues to grow globally, we continue to push the boundaries of what we can achieve with the technology. Serverless has proven to be a powerful tool to help grow the digital offering of one of the worlds largest websites, both from content delivery through to powering our build tooling.

Personally I’m very proud of the work which goes on within the BBC. There is an amazing software engineering community, filled with amazingly talented people. Recent events have shown the critical nature of the service we provide, to be part of a team that helps keep the world informed, educated and entertained is a privilege.

We’re always looking for talented people to take on the goals we’ve set ourselves. If you want to work as part of an amazing team, on a world class product take a look at our roles.

--

--

--

Building the best BBC products, platforms and services for audiences in the UK and around the world

Recommended from Medium

How to Create and Activate Flutter Chatbot?

Ansible is a radically simple IT automation engine that automates cloud provisioning…

Advance Scrapy for scrap best films from IMDb with multiple pages

LiveRamp Is Migrating to Google Cloud Platform

How to enable OCI8 in XAMPP

Configuring Serverless Framework for multiple stages

Smart Home Solution using Face Recognition

Introducing Evermore Academy’s Quest’s

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Johnathan Ishmael

Johnathan Ishmael

Lead Technical Architect at @BBC. Electric car enthusiast, runner, gamer, geocacher and wannabe baker.

More from Medium

10GB Ephemeral Storage for AWS Lambda

How to Build Lightning Fast APIs With AWS Step Functions

Improving our Serverless CI/CD up to 60% in a Couple of Simple Steps

Fastest Runtime For AWS Lambda Functions