Scaling: Part 1 — Mental models

Published in

BBC Product & Technology

7 min readJan 19, 2022

In the BBC Cloud Engineering team our purpose is to try to help other teams get the most out of cloud technologies. Our workload is quite varied, ranging from the nuances of applying a given technology to more general themes like monitoring, availability, performance, security, and cost. One of the topics that comes up time and again is scaling, treading that line between having enough capacity to satisfy user needs, but not having so much capacity that our servers are sat around doing nothing.

In this three-part series I’ll be talking about scaling.

In this part I’ll set the scene by talking about a few metaphors I use to introduce people to some scaling concepts.
In part two I’ll go into a bit more depth and talk about some things to think about when scaling.
In the final part I’ll describe an approach that some systems can take to sidestep some scaling concerns.

What are we trying to do? Fairy tales and aliens

Having grown up on fairy tales, stories about space ships, and other adventures, my favourite metaphor to describe why we scale is the Goldilocks zone. In the fairy tale Goldilocks and the three bears a girl called Goldilocks goes into the home of some bears while they’re out for a stroll, and decides to eat their porridge. The first bowl is far too hot, the second bowl is far too cold, but the third bowl is just right.

So where do the aliens come in? Well, one of the most important things for life as we know it to exist is liquid water, so to find alien life we might look for solar systems containing planets that could have liquid water. They can’t be too close to their sun — that’s too hot — and they can’t be too far away — that’s too cold. Those planets have to be in the middle, in the area where the conditions are just right. One name for this area is the Goldilocks zone.

Software systems have a Goldilocks zone too. We control the resources that they run on, like the processing power or the amount of memory available. If we under-provision systems by giving them too few resources they “run hot” trying to cope with all the work, and things get slow, fail, and the system might fall over entirely. At the other extreme if we over-provision systems by giving them too many resources we’ll end up with them being sat idle and “cold” — and costing us money we could put to better use somewhere else. We want our systems to run in the zone between these extremes — not too hot, not too cold, but just right.

Vertical scaling: The size of your boat

In the 1975 classic Jaws the town of Amity Island is having trouble with a man-eating shark. As our heroes are out looking for it police chief Martin Brody catches sight of the huge monster for the first time. Backing away, he turns to the ship captain and utters the famous line: “You’re going to need a bigger boat.”

Chief Brody had one big problem. It might not have come up often, but when it did come up he needed a big boat to deal with it. As earlier film scenes show, smaller things — like an inflatable lilo — just won’t cut it. There’s a minimum size you can get away with.

Picture of a shark — The author of Jaws, Peter Benchley, went on to campaign for the protection of sharks following the impact of the film.

The same’s true for software systems. If you need to handle requests to process 10GB files, but your system runs on a computer with only 5GB of RAM, it’s unlikely to handle it as well as a computer with 20GB RAM. But bigger computers cost more, so to stay in our Goldilocks zone we need to get the right size of machine. When developers talk about vertical scaling or rightsizing they’re talking about changing the size of the underlying machine to be appropriate for the task at hand — picking the right size of boat.

Horizontal scaling: Supermarket lines

When you’re at the supermarket in a rush it seems inevitable that there’s a long line at the one till that’s open, while there are another four or five tills without a shop assistant. It’s frustrating, but you probably don’t blame the one shop assistant who’s beeping things across the checkout, or wonder if they’d be quicker if they had more arms. The question on your mind is why the shop isn’t opening some of the other checkouts, to get you through faster.

What you’re looking for is horizontal scaling, adding more capacity by adding more copies of your system. There’s a limit to what one shop assistant can manage, but you can get more done with more of them. So at periods of high demand a supermarket will have more checkouts running, and at periods of low demand they will save on wages by reducing the number.

This model is great for keeping us in our Goldilocks zone. If we get busy and our machines start to work so hard that we’re running the risk of problems we scale out by adding more machines. If we’re quiet and our machines start to take it too easy we see that we’re over-provisioned and take some away again. Cloud providers provide facilities to help handle the business of directing requests to the right machines and keeping the requests spread out fairly evenly among them, so they’re all working about as hard as each other.

Horizontal scaling tends to be used over vertical scaling because it’s virtually uncapped. Vertical scaling ultimately hits a limit when you’re running your system on the biggest, fastest, most expensive computer there is. However if that single massive computer has 100 units of capacity and each individual task requires 1 unit of capacity, then having 101 relatively cheap 1-unit computers has the potential to outperform it. Remember though, there is a limit to how small you can go — Chief Brody wouldn’t have done well against that shark even with an entire flotilla of lilos!

Dynamic scaling: Keeping comfortably warm

If you get too cold you can put on a jumper or a coat, and when you’re too hot you might take them off again. Maybe it’s a really hot day and you change into shorts to cool down, but switch back to jeans as the temperature drops. This is an example of a negative feedback loop, where the state of the system — too hot or too cold — prompts action to correct it, and to bring things back to a happy medium — just right.

We can’t just go about this as fast as we like though. It takes time to warm up even after you’ve put clothes on, and time to cool down when you’ve taken them off. If you don’t allow some pause time between changes you’ll soon find yourself either naked or wearing all the clothes in the house!

These sorts of negative feedback loops — including the pause time after changes — are used to control a lot of scaling of cloud software systems. Some systems will have nice, regular usage patterns that can have their scaling adjusted based on a rigid clock schedule, but many will want to flex based on user activity to keep them in their Goldilocks zone.

These negative feedback loops work well with horizontal scaling. Think back to our supermarket lines. If we have a lot of checkouts with only one person, or no people, in the line we can probably keep our customers happy even if we shut one or two. We scale in, reducing the number of checkouts that are open and save some money on shop assistant wages. On the other hand if lines are getting long, with ten or twenty people queuing up then we probably want to scale out, increasing the number of checkouts that are open to improve the customer experience. To give our customers a good experience without breaking the bank we just need to pick a system metric (like line length), set some limits for it (too long, too short), and we can set up scaling actions that add or remove system capacity (open checkouts) when those limits are breached.

When putting these feedback loops in place remember that there are many aspects of computer performance to think about. For example, a cache for web traffic may be more likely to use up all its memory or networking resources than it is to reach 100% CPU usage. Scaling that cache based on its CPU usage could lead to insufficient network resources, and problems using the service. Consider at least network, memory, and disk capacity, as well as CPU, as possible limiting factors when scaling your systems.

Next time

In this first part we’ve introduced the basics of horizontal and vertical scaling, and the use of scaling actions as part of a negative feedback loop for controlling scaling out (to add capacity) and scaling in (to remove it). In the next part of the series we’ll dig deeper, and look at some things to be considered while scaling, and in the final part we’ll look at an approach that can help us bypass some of the headaches scaling brings with it.