First steps with System Design

The first encounter 👋

Sumanjali
7 min readJun 26, 2020

You are visiting amazon.com during a sale, a time when traffic to the site is way more than normal. But the site works as fast as it does every day! Okay, that is because Amazon has so many server machines and uses them to serve you. But do they keep all these machines in use every day even when there is no festive sale? How do they manage all the traffic that comes their way? How do they serve all your requests? Lots of questions… All the answers lie in System Design.

System Design is that phase of software engineering where we layout the architecture, define interfaces, modules, and make design choices to handle data, satisfying the project needs.

System Design interviews have become vital and common for all software development roles. They mostly deal with web application architectures. Here are a few points covered regarding the same, that would help get a basic idea on it and start preparing for these interviews.

Let us start with this sample objective to understand the concepts of System Design.

Bright sunny day! 🌇

Uma reads through a popular news website and finds so many tragic stories. She is disappointed, and so are many other people. This is what happens every day! She notices that there are small inspiring stories happening around everywhere, going unnoticed by people. So, Uma is now on a mission to develop a website where she can share these untold stories by getting news from peers around the world, spreading happiness and positivity.

You are asked to help this sweet lady to design the architecture and make this website accessible to people around the world.

Sounds great! So, what next?

Understanding the requirements 💡

Let us analyze the problem and make the objective more clear and specific.

  • What is the type of data in the stories? (Text and Image)
  • What can be the length of each story? (A maximum of 400 words and 1 image)
  • How are the stories stored in the database? (As HTML pages)
  • How are the stories presented to the user? (Index pages with hyperlinks to articles for different categories and locations)
  • In which languages is the data made available? (Only English)
  • Search functions for stories, filters? (No… This is going to be a basic website. User can scroll down to get older stories)
  • Archives? (Stories can be archived if they get too old… But there is no need to archive data initially)
  • How many hits are expected per hour? (We shall prepare for a ten thousand, initially)
  • Where do we expect most of our traffic from? (India, Australia, and US)
  • How many stories can we expect per day? (Collectively 10 from all the peers who get the stories)
  • Do we store images with different resolutions, what can be the sizes? (No… We store a single image of size around 25 KB)

Now that the objective has become more specific, we shall proceed with making estimates.

Making estimates 📝

  • 400 words can make a file of size approximately 2.5KB and the image takes 25KB space at most. So, the total size of each story will be around 35KB, with little style elements.
  • Different index pages with all the hyperlinks, few thumbnails, and CSS elements can make 80KB since we are keeping things light.
  • There will be 10 stories per day and if we consider a phase of one year, we will need approximately 291200 KB, which is around 290 MB. This is small, so we can store all the stories in a single disk.
  • Pages served can be of a maximum of 100KB per request.

Standard terms and related design decisions 📔

  • Bandwidth: It is a measure of how much data transfer is possible across a path. It is usually at least 2 Mbps for most of the users, so pages transfer won’t be a big issue.
  • CPU, RAM: Let us assume that the Virtual machine configuration we consider, would be similar to an n1-standard machine on Google Compute Engine.
  • Latency: It is the round time from request to response. It depends upon the server’s speed, so we have to deal with it based on the number of requests. Since the initial load is considered to be ten thousand and assuming that the GCE VM can handle 250 concurrent users, we will need 4 standard VMs.
  • Protocols: HTTP traffic, TCP connections will suit best for this application.
  • API Endpoints: These are heavily dealt with while designing interfaces and modules, but here we consider none because this site is equivalent to a static site and we are just serving HTML pages.
  • Hosting: Hosting a website simply means we are placing the resources on a server that is made accessible to the user by connecting it to the internet. We choose to host it on a cloud service provider, say Google Cloud, AWS, Azure because of their popularity and reliability.
  • Zones: VMs are created in a datacentre zone which can cater to all the locations from which we get the traffic. Considering criteria like costs, we choose the US zone.
  • Virtual Machines: The code is hosted on the virtual machine’s server group related disks. On each VM created, there is a server program like nginx running. We also assign a static IP for these virtual machines, so that we can refer to the machines with the same IP, each time they start.
  • DNS: Domain Name System where we reserve an identity, a name for our website. This is important because it allows the user to remember the site and not the many virtual IP addresses we have. So, even if a VM is down and we run another VM for the website name, the users will have a smooth experience with the site.
  • DNS Service Providers: Websites like NameCheap, GoDaddy provides DNS services like purchasing domain names, adding A record to refer to a server VM and DNS load balancing.
  • Load Balancing: Since we have four VMs, we will add their IPs to separate A records in the DNS and the requests are sent to the VMs in a round-robin manner.

Rough design Implementation till now💻

  • We create VMs and run the server.
  • Allocate static IP, allow HTTP traffic, set up the firewall.
  • Get a domain name.
  • Refer to the IPs using DNS services.
  • Setup the DNS load balancer.
  • We can also configure the HTTP load balancer.
  • We scale up manually when there is a prediction of more traffic or when there are requests that couldn’t be served fast.

Extending the objective 🌏

Now that the website is running well, we focus on issues like increased serving time for regions far away from the server zone.

Number of hits increased from many regions of the world, so there are many requests, so the VMs are needed to be scaled more frequently, so managing requests gets more important.

Using Content Delivery Network will solve the issue of problems with content delivery to different locations. They act as data cache servers for a network location and provide requested data in lesser time to the region.

The above image shows how CDN serves requests. [Source: CloudFare]

Website cache can be browser cache, data cache, output cache, and distributed cache. For this site, distributed cache by CDNs will work great because we have many visitors from different geographical areas, though we will be caching data on the browser.

CDNs save a copy of the server data which is refreshed after a certain amount of time regularly, that is served for the requests in its region. Since this is distributed, this keeps data safe and also serves users fast. The original VMs get the load from these cache servers and the number of requests they need to serve becomes very less when compared to the original number of requests. This way, the traffic for the VM and VM costs are also managed, and the serving time of the website to clients in all regions also reduces.

Similar things happen when we hit a website. Connection establishment, Our Router to DNS, DNS to VM, VM to content, content is sent as a response. If the cache is available, it is used.

These are the few important basic concepts for System Design. We get more proper understanding when we implement them along with reading. So here are few resources to get started with.

Where to start?

Being always repulsive to Computer Networks and after running away long from these concepts, my interest in this came out while completing the Free System Design Micro Experience of Crio.Do because of the concept of learning by doing. This is where I could practice and understand the basic concepts which grew my interest. The modules are simple and structured well. So, I highly recommend you try out the ME.

Crio’s System Design ME Description

Also, the resources in this GitHub repo are a great source of reference to prepare for the interview.

References 📚

Here are the major references used for learning, while other citations are done in the article as hyperlinks.

[1] https://www.cloudflare.com/learning/cdn/what-is-a-cdn/
[2] https://crio.do/detail/ME_QPREP_SD
[3] https://www.ironistic.com/four-major-caching-types-and-their-differences/
[4] https://www.hostinger.in/tutorials/what-is-cdn
[5] https://en.wikipedia.org/wiki/Systems_design

Happy learning😄

--

--