Scaling: Part 3 — Minimising scaling headaches

Published in

BBC Product & Technology

6 min readFeb 1, 2022

In part one of this blog trilogy we introduced some scaling terminology, and in part two we looked at if it’s worth scaling your system at all, how delays in scaling can make it difficult to respond to sudden spikes of user traffic, how caching can reduce the amount you need to scale, and we touched on using managed services to let someone else take the reins on dealing with scaling. In this final part we’re going to look at how structuring our systems to use an asynchronous static publishing approach to help us minimise our scaling headaches.

What is static publishing?

Static publishing is most often used to describe a method of publishing websites, though the principles can be applied more widely. In a static publishing approach queries are calculated in advance and published to a location as static files for retrieval on user request, rather than dynamically calculating those results when asked for them. This is a bit like caching the results of calculations for a while to reduce the work we do, but goes further as we don’t wait for a user’s request before working out the result, and we try to make sure we only calculate the answer once per data change.

When a user makes a request all the processing has been done, all that’s left is to take the file and pass it back to the user. Just picking up a file and returning it is a computationally light-weight operation that requires little set-up, so we can vertically scale down to use small machines and horizontally scale out with minimal start-up delays. From a scaling perspective we even benefit from requests being quicker to serve, as the shorter a request is the sooner it frees up the resources that were dealing with it, allowing them to be allocated to the next request.

If you are generating a static, unpersonalised website you can pass your statically generated files to a Content Delivery Network (CDN) to serve for you, letting them take care of scaling. CDNs put a lot of effort into scaling and responsiveness — even doing things like paying ISPs to host their equipment so it’s as close to end-users as possible, so using a CDN makes a lot of sense when operating at scale. However you don’t need to be using a CDN, or to be serving a public website, or even to be generating the final piece of data returned to an end-user, to benefit from a static publishing approach.

Other benefits

Software systems will often need to call other systems to work out the answer to a question, and those might call other systems. Imagine that you have a call tree where one system calls two systems, and each of those also calls two systems that call two systems.

Diagram showing call tree of 1/2/4/8 nodes

If there’s no caches involved, every time you make a request to the top-level system you’re calling a total of 15 systems, and transferring data from the fourth layer back to the third, then to the second, then back to the first. Communicating over a network takes time, so reducing the number of levels you go through — the call tree depth — means that you’ll return the answer from your top-level system more quickly (particularly if those calls cannot be done in parallel). That’s before we factor in the time taken to perform processing in each node in the tree, which we also save if the result has already been calculated. Having results already calculated makes things faster for your users.

There can also be cost benefits. Cost estimation can be nuanced, but if you are performing a high number of reads when compared to the number of updates you’re performing, there are often cost savings from calculating once and serving many times:

Data storage is generally cheap in comparison to data processing.
If you’re not calling other systems during processing of a user’s request, those systems do not have to scale with user traffic.
Data transfer costs between systems is reduced if you make fewer calls.

It’s important to pay attention to both the number of items you’re pre-processing and their frequency of update. If we go back to our example call tree, and imagine that our fourth level of systems are returning one of ten items. The second level systems could then be returning one of 100 combinations of items, the third level one of 10,000 combinations, and the top level one of 100,000,000 combinations. It still might make sense to pre-calculate all 100,000,000 combinations, but it increasingly becomes important how often they are accessed compared to how often they update. It may be that in this scenario we’re best off converting the second level of systems to be statically published, letting our first level still do dynamic calculations.

Dealing with updates

Our statically published files might be updated based on some sort of schedule, or be updated as a result of other user or system activity. All files might be updated together, or just the pieces that have changed — often referred to as the delta, after the algebra term. Processing everything is reasonably straightforward; you list out all the combinations of items, and make the calls needed to calculate them, possibly through the same systems you’d be using to serve requests if you weren’t doing static publishing. You then write those results to your statically published files.

For deltas you can sometimes calculate what’s changed based on timestamps, but often you’ll be receiving some sort of change notification message. Examples would be a message from an internal system telling you we have a new episode of a radio show has been published, or a message from an audience member saying how far they are through a programme on iPlayer so we can help them resume where they left off if they don’t finish the programme. In a synchronous request/response approach you’d deal with these updates as they came in. You could still do that with static publishing, but you may be better off taking an asynchronous approach: having your system say “thank you for the update” and putting it on queue to process later.

Having a queue of updates to process, rather than handling them as they come in is beneficial for scaling. The system accepting updates and saying thank you is not doing a lot of work, so it should scale well. If there’s a sudden increase in updates due to a traffic spike you don’t need to scale out quickly for it — the queue just gets longer — so scaling delays in your queue processing are less of an issue. You might not even choose to scale your queue processing at all, if you’re happy your new item backlog will be processed quickly enough. There can also be efficiencies from dealing with batches of items, rather than dealing with them one at a time, reducing the overall amount of work we need to do. If you’re updating based on a schedule you could even scale all the systems needed to process updates according to that schedule, now they’re not linked directly to user behaviour.

We should make sure to stay aware of how long it takes us to process our updates. Our statically published files are out of date until the updates for it are processed, which means we might be giving out-of-date information back to our users. There’s normally a product requirement around the freshness of data that says how long we can serve stale data for.

Sensitive data

If the data being processed into static files contains sensitive data (e.g. Personal Identifiable Information or PII), we need to take care that it is handled, stored and accessed securely. If we take a static publishing approach by pre-processing the data and storing a copy of it data we have increased the number of places that need additional security and scrutiny.

Beyond this there may be legal requirements that need to be met, such as around removal of a user’s data on their request. Having more copies of the data makes this more difficult to do, and integrating supporting mechanisms for the removal of data from our published store of files may need to be considered as part of system design — particularly with respect to how upstream systems and business processes inform us of such a need.

Conclusion

Here we’ve shown how a static publishing approach can help us minimise the scaling issues we need to handle, by doing work before our users ask us to. We’ve looked at how this can result in a faster, cheaper service, and touched on a couple of trade-offs that need to be considered when taking this approach.

Static publishing is something I find Cloud Engineering recommending more and more as a way to mitigate scaling issues, particularly where more traditional caching approaches aren’t successfully addressing issues with the volume of user requests we get to iPlayer or Sounds. Maybe it can help your organisation too.

Happy scaling!