How we deliver BBC Web Pages to the Internet
In this post Neil Craig, Lead Technical Architect in the BBC Online Technology Group gives an overview of the way in which the BBC serves the majority of its web pages to the public.
To give some context to help explain some of the methods we employ (and why we don’t use some others), it’s probably helpful for me to give you a rough idea of the scale at which BBC Online operates. Over the last few months, we’ve served (approximately) the following volumes:
Our peak to mean is usually between 2:1 and 3:1 but large-scale events can see larger deviations from the mean. It’s worth noting that the reason we don’t see more web page assets being served is largely due to in-browser caching — without that we’d likely be serving an order of magnitude more web page assets. The above data doesn’t include API calls which many of our pages make to supporting services.
We have audiences in over 230 countries worldwide with around 75–80% of our traffic being served to the UK. The USA & Canada are typically second and third in terms of volume served with mainland European countries, Australia & New Zealand, India and Nigeria thereafter.
BBC “Products” and shared hostnames
BBC Online appears to be a single website since (for the most part) it’s presented to users on either www.bbc.co.uk or www.bbc.com. In reality though, BBC Online is actually made up of many individual websites which are specific to BBC “Products” e.g. News, Sport, Weather, Radio, Children’s etc. Each Product is assigned one or more namespaces (a path prefix such as /news, /sport and so on) over which the Product team have almost complete autonomy — they can build, manage and host their website(s) as suits them best. For consistency, we have some shared components which provide common functions such as the BBC header, search bar, footer and configurations to all our Product websites.
Delivery of content
Most of our page assets are served from sub-domains of bbci.co.uk (rather than bbc.co.uk or bbc.com). This is a conscious decision which has both security and performance benefits. We mandate that cookies are never set on bbci.co.uk, whereas bbc.co.uk and bbc.com are allowed to contain cookies. Since no cookies are sent with requests to bbci.co.uk, there is less chance of cookie data leakage and also the HTTP requests and responses contain fewer bytes of data, which results in increased performance.
Content types and delivery methods
Requests for web pages from users in the UK & European are served direct from our UK origin (see “UK origin” below) under normal circumstances; though we maintain the option to switch traffic to a CDN — we test this regularly with live traffic. Requests for web pages from all other geographies are always served via a CDN (see “CDN” below).
Web page assets (excluding audio/video media)
Our non-A/V web page assets (images, script, stylesheets etc.) are mainly served from a CDN (for all geographies, UK included). There are some exceptions to this, however CDN delivery is the most common configuration. Origins for these assets may differ depending on the service or Product but mostly, we use online object storage such as AWS S3, which scales and integrates well with our common workflows and deployment strategies.
API and Service calls
Many of our web pages make requests to background services in order to provide specific functionality. The origins for these APIs and services are Product-specific but increasingly they are components deployed to AWS, often using a common pattern of an ELB backed by EC2 instances. Product teams are responsible for ensuring their services can scale to meet demand as well as for the development and management of the service and its infrastructure.
Our UK origin consists of multiple active datacentres which each contain resilient, high-capacity HTTP load balancers. These load balancers perform TLS termination, path-based routing, caching and high-availability load balancing for our Product website origins. As mentioned previously, Product websites may be hosted via our on-premise platform, or may be off-premise, for example on AWS.
We prefer that both HTTPS and X509 client certificate authentication are used for connections between our load balancers and Product websites, and mandate this for certain sensitive classes of data.
As noted above, we use CDNs for non-UK/EU web page delivery and also for the majority of our web page assets for all geographies.
CDNs provide local-to-the-user TLS termination, content caching and persistent connections (to save the latency of creating new connections with TLS handshakes for each request) to our origin, all of which aid performance.
We have to terminate inbound (from the end user) TLS/HTTPS connections on the CDN edge since the CDN needs to be able to read the request details in order to either serve the requested object from cache or request it from origin. Connections from CDN to origin are always HTTPS where the original request is made over HTTPS, this maintains a secure request chain. We’re increasingly using HTTPS requests for web pages and assets and we’ll look to enforce HTTPS to origin in all cases in the coming months. A request to origin also includes an authentication token, where necessary, to allow the origin to assert that a request came from a known, expected and authorised entity.
BBC Online is constantly evolving and there are plenty of items on our technology roadmap. Some of the changes currently in progress include:
- Migrating our remaining HTTP Product websites to HTTPS (this will enable HTTP/2 as we’ve recently completed deployment across our shared estate)
- Migrating the remainder of our HTTP media delivery to HTTPS
- Removing/reducing dependencies on Adobe Flash (we’ve decreased our usage of Flash in media by around 75% over the last 12 months)
- Further optimising our TLS configurations for performance and security
- Further optimising performance for our global audience
We’ll follow up with blog posts on these items in due course to fill you in on the details.
If there are elements you’d like to know more about or something you’d like clarified, please leave a comment or send me a message on Twitter @tdp_org.