System Design — Capacity Estimation

Caique Ribeiro Rodrigues
5 min readDec 18, 2023

--

An interesting additional feature to make your system design richer.

System design is a topic that shows up on the path of every developer who’s moving forward (or at least trying) to the next level. It’s not just about coding, you’ve got to design the infrastructure and integrations between systems at higher levels. You should be an architect, not just an executor.

In a system design, nothing can be left to chance. When choosing a database, you need to justify the reasons and the tradeoffs. Chosen for AWS EKS instead of an EC2? Explain yourself. Using a lambda function? Well, just explain it. Everything should be thought through realistically because this stage is crucial for your system’s performance, determining whether it can handle all the requests needed and if its cost will be as minimal as possible (you know, in the end, it’s all about the money).

One detail in this beautiful practice that often goes unnoticed in some designs is the capacity estimation. Since it’s not a mandatory topic in smaller companies or entry-level job interviews, capacity estimation is sometimes treated like a distant cousin you only hear about occasionally.

These calculations aim to seek indicators of the number of requests, bandwidth, throughput, and storage over a specific period. These calculations play a crucial role in some subsequent modeling decisions:

  • Choosing the database and storage size
  • Clustering and/or load balancing
  • Caching

The key in this process is not to aim for complete accuracy but to use guidelines to get close and calculate values more rapidly.

Scientific Notation

In capacity estimates, it’s common to use scientific notation. Working with calculations in this format makes the process much simpler and quicker than dealing with thousands of zeros in a high-pressure situation. Remember: Precision here is unnecessary; approximation is more than enough.

Some examples of useful notations do keep in mind are:
1000 = 10ˆ3
10000 = 10ˆ4
100000 = 10ˆ5
7.000 = 7*10ˆ3
95.00 = 95*10ˆ3 (or9.5*10ˆ4)

Moreover, some conversions work like magic:
KB → MB = KB/10^3
MB → GB = MB/10^3
GB → TB = GB/10^3
MB → TB = MB/10^6

To make it easier, remember a trick: The exponent will be the number of zeros to the right. 1000 will be 10³, for example.

Most common calculations

In a system design, a lot of calculations may be necessary to achieve your goals. Indeed, there are some kind of them that appear almost every time and probably will be the only ones you’ll need in 90% of time.

Requests per second

DAU / 10⁵

Let’s delve deeper into this.

DAU means Daily Active Users, in other words, how many users access the platform in a day multiplied by the amount of request each one does. 10⁵ is just the quantity of seconds in a day.

Here’s an example:
1.000.000 users.
5 requests per day (each user)
5.000.000 requests per day.

First of all, let’s calculate seconds in a day
24 (hours) * 60 (minutes) * 60 (seconds) = 86.400 seconds. Rounding it, you could just consider 100.000 seconds. Using scientific notation you’ll get 10⁵.

Applying the formula
Now we know we have 10⁵ seconds in a day and 5.000.000 requests per day. As I said before, this kind of calculations is way easier to do with all elements in scientific notation, so you just have to convert 5.000.000 in it. One million in scientific notation is 10⁶, so all you have to do is multiply it by 5 and get 5*10⁶.

With it in hand, calculate the formula:
5*10⁶ / 10⁵
5*10^(6–5)​
5*10
50 requests per second (RPS).

Using the formula you could calculate that your API will receive 50 RPS.

Bandwidth per second

Bandwith is the amount of data transfered in a request. Knowing the amount of bandwith per second is crucial to better think about how you will design you system. Calculate it is very simple:

Requests per second (RPS) / Request size

Of course it will be impossible to predict the exactly amount of all requests in your system, but calculating one (normally the most important in the context) will be enough. Remember, you just have to show that you are not building software from imagination.

Calculate a request size requires knowing the size of JSON/XML/Plain Text you are receiving and some more complex stuff in the HTTP request, so for now let’s just assume each request is 50 KB.

50 (RPS) * 50 KB = 2500 KB per second (2500 KB/s)

You can transform KB in MB using one of the tricks I showed at the beginning. 2500 in scientific notation is 2.5 * 10³:

2.5 * 10³ / 10³
2.5 * 10⁰
2.5 * 1 ⇒ 2.5MB/s​

Storage per second

The last of most common calculation is the storage. I will show it per second, but in many cases it will be required to be done per year or each 5 years.

To calculate it you just have to multiply the amount of writes per second by the size of the request. Remember the RPS we’ve just calculated? Let’s use it now. Imagine that, for that 50 RPS, the proportion is 9:1 (9 reads and 1 write). Doing a quick math, we’ll have:

writes = 50 rps / 10 (10 percent) = 5 rps
reads = 45rps (50 total –5 writes)

Ok, now assume each write request will store 50 KB in database and apply everything in the following formula.

RPS * Request Size

5 rps * 50 KB/s = 250 KB/s, or 0.25 MB/s

You just calculated the storage needed per second. It is not actually so useful, but you can use it to calculate other periods, like day and month.

An example can be storage per day:

storage por sec * 10⁵
0.25 * 10⁵
25 GB per day

Good! These are all the knowledge you will need to increase your system design. Do not forget that System Design is more about if you are able to make good questions and think about saving resources than how to build a complex architecture.

Hope you like it! See you soon.

--

--