Optimising BBC Online’s Code Splitting Strategy

Matt Williams
BBC Product & Technology
11 min readMay 30, 2022
Photo by Sen on Unsplash

Modern JavaScript websites are the result of complex code bases, and the code base which powers BBC Online is no exception. There are many dependencies and frameworks that we rely upon in order to provide the functionality that our end-users expect, as well as code written in-house to cater to the needs of our website and its users; being able to click a button to follow a topic, search for content and play videos all require JavaScript code to be run by the browser.

For a website as large and with as many seemingly disparate needs as the BBC’s various pages, the amount of code required can grow exponentially. The code needed to render a news article - with images, social media embeds, videos and onward journeys - has only small elements of commonality with the code required to power complex sports data tables - with team badges, functionality to hide or show columns based on the screen width of the user’s device, display a team’s form, or provide live in-page updates for events that are happening live.

As such, it has long been the practice of the JavaScript community to implement code splitting. Rather than providing all of our JavaScript and dependencies into a single large file to be downloaded by the user, we can split our code into separate files. This can provide advantages on two fronts; downloading multiple smaller files can be more efficient when devices can start those downloads in parallel (though going too far and downloading too many small files can cause problems itself!), and the way that we choose to split our files can mean that some only need to be downloaded on certain pages. It is this second benefit which can be most powerful; if you have a large piece of code on your site that is only needed for a single page, then forcing users to download that code on every page is wasteful, and can cause problems for users who might have limits on the amount of data they can download on their network.

We have been using code splitting within the Presentation Layer, which powers large portions of BBC Online, for a number of years now. However, we had realised that the platform had outgrown the existing strategy we had in place for code splitting. So first, what was that strategy?

We had decided to create a small number of core bundles, which would be shared across the website and required for every page. This strategy meant that although the user would need to download around 5 relatively large files on an initial page load, those files would then be cached for future visits, or for when the user visited another page on the site. We make use of contenthashes in our bundle file names, which are a way of identifying whether the content of a file is different at build time — that means that when a code change is deployed, only the bundle affected needs to be re-downloaded, and the cache can be relied upon for the unchanged bundles. These core bundles consisted of:

  • our third-party dependencies (commonly referred to as the vendor bundle)
  • the code for our rendering framework
  • our GEL packages — components that are centrally owned and designed to be highly reusable and adhering to our Global Experience Language guidelines
  • additional components and utilities, that have either more specific use-cases or are not yet fully reusable — referred to as our website bundle
  • common utilities and libraries used across the website — such as our compliance library, used to ensure that we are not setting cookies that the user has not agreed to — referred to as our base-website bundle

In addition, the Presentation Layer has the concept of Containers. Containers are isolated slices of the experience that are data-aware, and combine together multiple components into a single experience. Our navigation bar is a container, as is the core view of a news article, whilst the lists of onward journeys are a separate container again. Each of these containers has a high level of separation of concerns, and are not allowed to directly depend on one-another, and so each one of these containers gets is own separate bundle file. This is what allowed us to not deliver too much unused code to the browser, and only provide the container-level code for pages that use that container.

So what went wrong with this approach?

Firstly, this approach was not wrong, in any sense. At the time we implemented it, it was the most suitable solution. However, in the time since, the number of pages being powered by the Presentation Layer has grown; the number of new experiences being developed for the audience has increased, and the resulting complexity of our code base has increased with them. This resulted in our core bundles growing to the point where our bundling library was reporting their combined size as over 1MB. Whilst these files were compressed before being sent to users, we were in essence requiring users to download over 1MB worth of JavaScript for every page. Much of this included code that was only used on a small number of pages — as portions of the Bitesize and Children’s websites began to move over to the new platform, the games library was added, which was only needed on games pages; as Sport pages began to move over, components specifically tailored to showing match data between teams or league tables were added, which were only needed on those pages.

If we take a look at the visual representation of our code-split bundles, provided by our build tool Webpack, we can readily see the effect that this has had on our core bundles:

A visual representation of our code split bundles, with sizes relative to one-another

In short, we were no longer optimised for reducing unused JavaScript being downloaded by end users, and we needed to take a second look at our strategy.

If you search around the internet for solutions to this problem, you will quickly find that there is no general guideline; every website is different, and the ideal strategy relies on the specific needs of the page and the dependencies being loaded. So we had to do some investigation of our own in order to decide how to proceed. Our first target however, was in reducing the size of the largest bundle; defaultVendors.

If we take a look into the visual representation above, we can easily see that there are three very large categories dominating this bundle; @bbc with the games-atlas package, react-dom and @optimizely are very clearly the largest components here. Our website is React-based, which means that react-dom is necessary on all of our pages, and so wouldn’t make sense to remove from this bundle, but games-atlas and @optimizely are not used on every page, and so they could easily be removed from that bundle.

Initially, we tried moving both of these into another shared bundle; optionalVendors. If you take a look at the visual representation of that however, you may be able to see straight away what the problem with that approach was:

If any page needs to make use of the @optimizely packages, it must still download games-atlas, whether it needs it or not. So whilst pages that use neither of these packages would be aided by this change, it wasn’t ideal.

Our next step was to create separate bundles for each third party dependency that we had determined could be removed from the defaultVendors bundle. This would achieve the aim of only downloading this third-party code when it was needed, and give a benefit to the largest number of pages. In addition to creating separate bundles for games-atlas and @optimizely, we decided to also create a separate bundle for date-fns, and its related date-fns-tz — these libraries are known to be quite large, and whilst we use them on most pages server-side for formatting date strings, their use client-side is on a fairly limited number of pages.

A visual representation of all our third-party code bundles after splitting them out

This ultimately allowed us to trim around 80KB (gzipped) from the download size of our defaultVendors bundle — a saving of around 44% of its original size.

There are some small oddities with this approach, which you will see in the images later in this post. By defining these separate bundles and marking them as async bundles, that indicates to Webpack to fill them with code that is not needed on initial page load, and can be downloaded as-needed. However, some code — particularly in the @optimizely packages — is needed on initial page load. As such, Webpack has split these packages into two — the code that can by downloaded asynchronously is in its own bundle, whilst the parts of the code that are needed on initial page load have become part of another bundle. We still have an overall saving in the amount of JS being sent to users however, so this was an acceptable compromise to us.

The next target was our second-largest bundle; website. As noted previously, this bundle contains the majority of our components that are either very focused in their use-cases or aren’t quite shareable enough yet to be moved into the GEL bundle. It also includes some utilities used within those components and across some of our containers, page-level layouts, and non-visual parts of the page experience such as setting dynamic page metadata. The single largest category within it however are our UI components.

Our initial approach was to just exclude these components from the website bundle group, and allow Webpack to do the heavy lifting of deciding where to perform grouping and splitting of the code. However we found that this approach wasn’t ideal — other core bundles started to absorb some of these components, and not ones that were necessarily that commonly reused either. Other times, components ended up being placed into the container bundles. Overall, it felt like we needed to exert a bit more deliberate control over this approach.

The next step was to give each component its own bundle, in the same manner as our containers. Whilst this was easy enough to implement, it went a bit too far in the other direction; some of our UI components are very small as far as their JavaScript code goes, and it felt unnecessary to have a separate bundle for such a small amount of code. This approach worked better than the previous one, but it needed a bit of extra nuance.

Webpack provides the ability to set additional rules to control when to actually create a new bundle. Up until now, we had been setting rules entirely on a path-based approach; now however, we started to make use of the size-based rules, which also required making use of its priority rules. By setting a minimum size requirement for a new bundle to be created, we could control when to create a new component bundle and when to not. And by setting the path-based rule on our website bundle to be permissive enough to allow components to pass the rule, but with a lower priority than the individual component bundles, that allowed those components that fell short of the minimum size rule to fall back into the website bundle. This approach gave us the best of both worlds; those components which were comparatively large in size could get their own bundle (along with the contenthash-based caching benefits mentioned earlier), and would only be downloaded on pages that needed them, whilst the smaller bundles could remain a part of the core website bundle, downloaded and ready to use on each page. We felt this struck the best balance between not wanting too few, excessively large bundle files to download, versus having too many very small bundle files.

A visual representation of our code splitting strategy after the changes

All of these changes have clearly had an impact on the raw size of the core JavaScript bundles we’re delivering to users, but how do we know that will have a positive effect on our end users? We did attempt to run these changes on a non-production environment for a time and compare some performance metrics with our live website, but we quickly realised that we were not comparing like-for-like; our live website benefits from a robust caching infrastructure, which can have a knock-on effect to performance metrics and led to our non-production environment showing significantly worse metrics than we expected.

Instead, we decided to merge in our bundling changes and deploy them to the live website, and monitor the relative performance from before the change.

This change has been in place for 3 weeks now, and has seen the following changes;

  • The number of requests for JavaScript files has risen by around 21%, but the total size of the JavaScript downloaded has fallen for all page types — down 14% on average
  • JavaScript Total Blocking Time has fallen by 27% on average, and Time to Interactive has fallen by 6% on average
  • The measure of how long the browser spends on the single longest JavaScript task has fallen by 16%, and the time on all JavaScript ‘long tasks’ as a whole has fallen by 20%

One of our pages that makes use of more custom components than others has actually seen the largest improvements in some of these metrics — whilst the number of JavaScript files being downloaded has gone up by 43%, the total size of those files has still fallen by 15%; Total Blocking Time has fallen by 31% and Time to Interactive has fallen by 14%. Given that Sports pages are some of those with the most custom components and are also some of the pages which need to feel the most responsive to their users, this is quite a significant benefit.

This still isn’t likely the end of our code-splitting journey; whilst we have arrived at a more optimal code splitting strategy for our current use-cases, we are working on an ever-evolving and growing platform. We fully expect to need to review this again in future as even more pages join our platform, and as new use-cases evolve. There will likely never be a perfect code splitting strategy, but by reviewing ours on a regular basis, and getting it as good as it can be with the information we have at the time, we can keep our focus on providing the most performant experience for our audience.

Photo by Brett Jordan on Unsplash

--

--