How Things Scale: An Introduction To Designing Real-World Scalable Software

8 min readDec 2, 2022

The intention behind this post is to expose some fundamentals of software system design — with an extra emphasis on the design aspect.
It answers a question, via a real-case example, that many engineers ask themselves: How do I even start designing a system?
I have found that too many resources online focus on the tools and on the “getting your hands dirty” approach.
While mastering the tools is important, we are, more often than we like to admit, skipping the core piece of the puzzle; the high-level thinking of how things operate even before we try to code them in our so-called programming languages.
At its heart, “the art of programming is the art of organizing complexity” (Farley, David. Modern Software Engineering).
Organizing complexity should come before you start typing any code (or maybe even thinking about computer systems at all!).
Thus, we will dive into a real case example of how we can use concepts that will guarantee that a particular instance of our system will be able to scale and identify what these concepts are.
Buckle up!

Let me give you a little bit of context on Creator Now.

At the heart of our business, we need to gamify information about creators’ YouTube channels in order to deliver them the best learning and growing experiences for their creator journey.

That means that, for each new user registered, we need to rely heavily on YouTube’s API to extract information about their creator’s life.

We want to make sure this information is kept up-to-date as frequently as possible. After all, there is no point in maintaining a product where the core piece of information that drives the user experience is lagging behind the “real-world” reality.

This is our key challenge:

How can we scale if we have to rely on a third-party API with limits that are out of our control?

First, let’s understand how the YouTube API works and where the problem truly lies using a real-world analogy.

Imagine the YouTube API is a big office that holds the data for all channels — and all things YouTube-related.

Every time anyone on the internet wants to check information about a specific channel, they go there. In short, they all need to send someone to this building to ask for that piece of information.

The flow is quite simple: someone walks into the front door. They present their credentials to the gatekeeper. Once they are allowed in, they are served by a happy YouTube employee:

Hello, sir; what is the info that you desire?
Can you please check the latest information on the CatsAreAmazing channel?
Of course, let me check our files! <pulls the data for the channel, and hands back a one-page report of it>
Thank you, good sir!

I hope the YouTube offices are newer than this!

Quite a happy ending, isn’t it?

But YouTube has experienced an overwhelming amount of people asking for stuff.

Since everyone is trying to get a lot of information all the time, they decided to put some rules in place (this does not reflect real-world rules of YouTube’s API):

You can only ask for a maximum of 50 channels/videos info each visit to the office
You can only come into the office 10 times per minute, 10,000 times per day

That leaves us with an inevitable bottleneck: we can only ask so much in a given day. Even worse: we have to pace the amount of information we are requesting per minute as well.

“Nooooooo! I really wanted to get all the information about all the channels in the world!”

Okay, so we gotta be smart about this.

Let’s do some quick math to find out the maximum number of channel information we can request per minute (and per day).

If we can go into the office 10 times/minute and each time, we can request 50 channels: 10 x 50 = 50 channels/min. Doing a similar math, we realize we can request 10,000 x 50 = 50,000 channels/day.

If you expand this a little further, you will realize we can use all our quota in 16 hours and 40 minutes — i.e. that is the time it will take if we request 50 channels/min to blow up our 50k channels/day quota.

So, the golden question here is:

How can we most efficiently keep our app updated with the latest information of users (even if we have more than 50k users)?

First, let’s establish some basic ground rules for ourselves that will make us save some trips to the YouTube office and make sure we follow their rules. Let’s break the responsibilities for this major task into smaller tasks that are executed by different departments at our company:

Let’s call our department that is responsible for reaching out to YouTube’s office the Scraper.
We do not need to request new channel information if we already got information for it in the last 24h (we are okay with a 24h delay to update the channels).
This means we need to remember the last information about a channel somehow and the time we retrieved it — let’s call this our Cabinet.
We probably want to have more than one person able to go to the office — since YouTube has a limit of 50/min, we can optimize our process by hiring 50 people to execute this job (let’s call them Runners)
We need to have some sort of control over how many visits we make to their office in a given minute and on a given day (maybe we can write it down somewhere? Let’s call it the Runners’ Log).
That also means that each time our runners want to go off and grab new channel info, we need to double-check if they are not risking losing their trip due to us extrapolating our minute/daily quota. This centralized control center will be called Runners' Department.

This is the initial draft of our workflow so far:

Now, what is so special about this flow beside the fact that it does give us an initial solution to (most) of our issues?

It provides clear Separation of Concerns:

The Scraper only cares about getting info for a YouTube channel (no matter how)
The Cabinet only cares about storing/retrieving info for a channel
The Runners' Department only cares about handling the runs to the YouTube office
The Runners' Log only cares about storing/retrieving info for the runs
The Assigned Runner only cares about doing the run itself

And, the most awesome of all of those: The App does not care about any of this! It just goes to the Scraper and be like, “Hey, I need the info for channel XYZ!”.

This is also the so-called concept of Modularization tied with Loose Coupling.

Let’s expand a little on Loose Coupling and how this makes our system a lot more scalable. In order to understand it, we need to understand why interfaces (contracts) between modules are important.

Let’s take the interaction between the Scraper and the Cabinet, for example.

They both need to agree on how they are going to talk to each other: Is the Cabinet going to receive the requests by a phone call? An SMS? — They also need to agree on how the Cabinet is going to return the info to the Scraper: is it a folder with files? An attachment via email?

All it matters is that they need to have an agreement, an interface, a contract.

Why?

Once that line is established, it does not matter how complex the internal structure of the Cabinet is: for all the Scraper cares, it can be a one-person operation or a five-store building army.

As long as the Cabinet can reliably comply with their contract, it’s all good.

This allows us and our development team to independently work on each one of these modules, making whatever decisions are best for its internal performance.

We can parallelize the work of each one of them, cause in the end, all that matters is that they know how to talk with each other.

This concept is applied even in our day-to-day lives: anytime you order something online, do you usually care who the carrier is? Do you know how internally a calculator operates to hand you back the results? Do you even care?

What a beauty, huh?

What if we have more requests than we can handle?

Shoot, that does not solve one of our core problems, still! We will build a backlog that the Runners’ Department might not be able to tackle in time!

It seems we need to have some sort of system to make sure we have a backlog of these operations and handle them in the most efficient fashion we can come up with, leaving no requests indefinitely hanging!

Let’s come up with some rules for scaling the number of requests to the Runners’ Department.

Some things (rules) that we can come up with are:

First come, first served: we treat the requests like a queue. This guarantees that, at some point, the request to get information for a specific channel will be handled. We just cannot guarantee when.
Prioritize channels that we do not have data for (new channels in our app). This might be a clever strategy: a channel we have no data on at all means that the user is probably unable to use the app. i.e. Having outdated data is better than having no data at all!
Make sure that we do not have duplicate requests in our backlog: if there is already a pending task for a specific channel, we can just discard further requests to it.

We can probably combine all these since they are not mutually exclusive.

Let’s go through an example to see if that all makes sense.

We started our day with a clean slate — no requests pending on the backlog.

Suddenly, we have an influx of 7,000 requests: the app is desperate for info on 7k channels! Given the logic we outlined above, we:

Clean out any duplicate channel requests that might be in this batch
Check which channels we do not have past info for; move these channels to the front of the queue
Store the queue in our Runners’ Log

Whoah! Wait a second, boss; our initial draft did not have this queue or anything like that!

Again, here’s the beauty of the modularization again: This is an internal change on how the Runners' Department operates.

The Scraper will still operate the very same way: it will keep asking for info about channels (as a spoiled child that it is!). It is the Runners' Department’s problem to store and handle this queue internally.

Remember: all the other modules care about are contracts and interfaces!

Phew!

Alright, pal, that was a lot — but hopefully, it made as much sense to you as it did for me when writing this.

You should now have a very good understanding of why modularization, loose coupling and abstractions matter.

You now also know how we can, even before starting to type code, think about designing a system that can independently scale.

Next time when you start to design a system, try to answer the fundamental questions of how you can design such independent, loosely-coupled modules even before you rush into start typing code.

Otherwise, you might end up start developing tomorrow's legacy code that no one understands and/or wants to deal with!