System Design Fundamentals — part 1

prudhvi reddy
6 min readJun 27, 2020

--

poster credits to Mohammad Fayaz

Why System Design?

Let us first understand why we even have to design a system before the head, we can directly create a server in available hosting services like Heroku, GoDaddy, digital ocean and we can host our application there. now, it will take care of serving our users

Amazing story till now, you hosted your first written application or your website and used some service to get a URL like prudhvireddy.js.org

But here’s a plot twist to it.

Note: for this blog, let’s ignore cyber attack and assume only genuine users trying to access the resource in internet. we will talk about how to handle cyber attacks in some other blog

What if a million users directly open your website to download your resume?

Any idea what happens? we never cared because that’s very unlikely to happen with our personal sites and even if it happens our server stops responding and once traffic clears we can get it up.

but for services like amazon, Flipkart happens all day all time and they need to serve their users for sure to keep their business up. what happens if the user is adding items to the cart and there is a missing item as the server couldn’t respond at that time?

users won’t rely on our service from next time and prefer to go for alternatives as our service is not reliable

To solve and talk about these kinda problems we need to study system design.

Designing large scale systems is not an easy job because there are many factors involved in it, in this system design series we will together learn how to do it.

  • I will take you through different pieces of the system that you have to play with when designing a large scale system
  • I will show you how to connect those pieces in real-time
  • and we will learn what trade-offs to make.

Let’s start with the subject already

in this first episode, I will talk about key components that we should take a look at when designing a system. if someone shows you a system and asks to talk about it, you can judge the system with these things. these things matter for a good system with given requirements.

Key components of a distributed system

Scalability :

Scalability is the property of a system to handle a growing amount of work by adding resources to the system without losing the speed that it used to provide before

imagine, you are an owner of some e-commerce clothing site where you are likely to get 100–200 users per day to shop clothes and you maintain your servers just a little more than that of the traffic.

now, it’s suddenly a festival time and traffic is increasing at very high rates on your site and the server is responding slower than usual which ends making users angry.

what will you do? hire more customer support people to say people to wait a little while and teach about server traffic to them?

well, I guess, renting one more server or some RAM and processing power is less costly and good for your business too in this case.

well, you got two ways to do that in here:

a) Horizontal scaling:

Add some more server to our current server and make those servers take some requests coming to it and serve the users such that speed will be more than as usual as before or the same.

Analogy: imagine you are lifting 5kgs every day and suddenly you are asked to lift 15kg with same speed then your two best friends come to save you by taking some lift

b) Vertical Scaling:

In vertical scaling, you add more power to the current server like increasing RAM, processor speed, storage….. and it can able serve to your users

Analogy: imagine you are lifting 5kgs every day and suddenly you are asked to lift 15kg with same speed then you go ahead and train for some more days to increase your energy and come back to lift 15 kgs

well if you observed there’s a downside in above procedure which is downtime. By the time you train or increase server capability, your system is not in a working state.

in all that time your users will miss the service, hence slowing down the business

Reliability

A system is considered reliable if it keeps working even when one or several components fail.

it’s like when lifting 15kgs your leg got an injury but you still be able to do that work assigned or you are fired.

well, that’s a little sadistic in here but it’s not like he continues with an injured leg. the system should work so that the leg can be replaced with a healthy leg immediately.

this is where redundancy comes. we need to maintain some other component in the same state as the current component in order to make them replace easily. a good reliable system will most likely not fail to serve

though that’s redundancy there, it’s not as costly as losing some customers because of failure in the system. oftentimes it won’t just stop responding but more bad things can happen like selling out products for free.

Availability

Availability is the time a system remains operational to perform its required function in a specific period

if a system is reliable then it’s available but if a system is available then it may or may not be reliable. Availability just talks about being available.

Analogy: ( available but not reliable )

service: There’s a person available to lift your 15kgs.
owner: are you sure that he will be able to lift?
service: well, I am not sure about it.

Analogy: ( available and reliable )

service: we got a highly trained person in here to lift your weight
owner: are you sure he lifts
service: yes, we tested him across all conditions and he passed all tests even when he fails he will manage somehow to completes his tasks in his best

Efficiency

system efficiency can be observed with these two factors:

  1. response time or low latency
  2. throughput — data delivered in unit time.

How much fast you are able to carry weight from one place to another place and how much weight you carry

replace weight with “data” and you with “server”

you can observe how youtube and Netflix stream videos for us so fastly even when there is a slow internet from our side and they are automatically adjusting the data size (quality of the video ) they should carry depending on our internet speed.

that’s a very efficient system we are able to see the quality video without buffering.

if it’s fast and less quality then it’s not efficient
if it’s quality is good but it keeps on buffering then it’s not efficient

high quality and high speed is efficient
where low quality and buffering video is not.

Maintainability

Maintainability is the ability to do changes to the system with ease, changes can be new features or bug fixes.

for example, some systems have alert systems with it where if some service fails automatically we are notified with the failure and we are able to fix it as fast as possible and report to users that we will be down this time.

in other cases, users and media will report to us that there is a failure in our system.

Analogy: how well, easy, and fast an actor can change his body type. ignoring the pure nepotism an actor will only get roles to play if he is able to change his body type and personality as fast as possible to the role offered.

Thanks for reading this blog. hope you had learned some vocabulary of system design from this blog. if you loved it then give some 👏 s to the blog. please drop a comment incase things are not clear at any place.

in the next part, we will see what is hashing, polling, events, load balancers, sockets, proxy servers, caching.

Follow me to get a notification for the next part.

update: read next part here: https://medium.com/@prudhvir3ddy/system-design-fundamentals-part-2-fe6ddb61fa37?sk=9e14d1bb963f9b618ff467c481dee7ea

--

--