Cracking The Code: Scalable System Design

16 min readAug 31, 2023

In the past two decades, substantial progress has occurred in the realm of large-scale web applications, reshaping our perspectives on software development. Consider the everyday applications and services we rely on, such as Facebook, Instagram, and Twitter; they represent prime examples of scalable systems. These platforms serve billions of users globally, necessitating designs capable of seamlessly managing vast volumes of traffic and data. It is within this context that the significance of system design truly shines.

Fundamentals of System Design (Horizontal and Vertical Scaling)

System design is the process of defining the architecture, interfaces, and data for a system that satisfies specific requirements. System design meets the needs of your business or Organization through coherent and efficient systems. Scalability refers to an application’s ability to handle and withstand an increased workload without sacrificing latency.

Horizontal scaling, or scaling out, means adding more hardware to the existing hardware resource pool. It increases the computational power of the system as a whole. Vertical scaling, or scaling up, means adding more power to your server. It increases the power of the hardware running the application.

Load Balancer

A load balancer is responsible for distributing the traffic among multiple web servers. Load balancer is the only point of contact for the client, they don’t have to worry about connecting to the server.

Load balancer connects to the servers using private IP. Private IP is an IP address that is reachable by the servers in the same network, so it adds an extra set of security.

Load balancers can be configured to use various algorithms to determine how to distribute traffic across the different nodes in a system. Some common algorithms include round-robin, least connections, and weighted round-robin. The choice of algorithm depends on the specific needs and characteristics of the system and the workload being handled.

MicroServices

The microservice architecture enables an organization to deliver large, complex applications rapidly, frequently, reliably and sustainably.

Microservice architecture, is an architectural style that structures an application using loosely coupled services. It divides a large application into a collection of separate, modular services. These modules can be independently developed, deployed, and maintained. Microservices operate at a much faster and more reliable speed than traditional monolithic applications. Since the application is broken down into independent services, every service has its own logic and codebase. These services can communicate with one another through Application Programming Interfaces (APIs).

Proxy Servers

A proxy server, or forward proxy, acts as a channel between a user and the internet. It separates the end-user from the website they’re browsing. Proxy servers not only forward user requests but also provide many benefits, such as:

Improved security
Improved privacy
Access to blocked resources
Control of the internet usage of employees and children
Cache data to speed up requests

Communication between two computers connected through a third computer acting as a proxy server. This can protect Alice’s privacy, as Bob only knows about the proxy and cannot identify or contact Alice directly.

Whenever a user sends a request for an address from the end server, the traffic flows through the proxy server on its way to the address. Similarly, when the request comes back to the user, it again flows through the same proxy server which then forwards it to the user. It improves privacy, security, and performance in the process.

Cap Theorem

The CAP theorem is a fundamental theorem within the field of system design. It states that a distributed system can only provide two of three properties simultaneously: consistency, availability, and partition tolerance. The theorem formalizes the tradeoff between consistency and availability when there’s a partition.

Consistency

In a consistent system, all nodes see the same data simultaneously. If we perform a read operation on a consistent system, it should return the value of the most recent write operation. The read should cause all nodes to return the same data. All users see the same data at the same time, regardless of the node they connect to. When data is written to a single node, it is then replicated across the other nodes in the system.

Availability

When availability is present in a distributed system, it means that the system remains operational all of the time. Every request will get a response regardless of the individual state of the nodes. This means that the system will operate even if there are multiple nodes down. Unlike a consistent system, there’s no guarantee that the response will be the most recent write operation.

Partition Tolerance

When a distributed system encounters a partition, it means that there’s a break in communication between nodes. If a system is partition-tolerant, the system does not fail, regardless of whether messages are dropped or delayed between nodes within the system. To have partition tolerance, the system must replicate records across combinations of nodes and networks.

Storage

Data is at the center of every system. When designing a system, we need to consider how we’re going to store our data. There are various storage techniques that we can implement depending on the needs of our system.

File storage is a hierarchical storage methodology. With this method, the data is stored in files. The files are stored in folders, which are then stored in directories. This storage method is only good for a limited amount of data, primarily structured data.

Block storage is a data storage technique where data is broken down into blocks of equal sizes, and each individual block is given a unique identifier for easy accessibility. These blocks are stored in physical storage. As opposed to adhering to a fixed path, blocks can be stored anywhere in the system, making more efficient use of the resources.

Object Storage is the storage designed to handle large amounts of unstructured data. Object storage is the preferred data storage method for data archiving and data backups because it offers dynamic scalability. Object storage isn’t directly accessible to an operating system. Communication happens through RESTful APIs at the application level. This type of storage provides immense flexibility and value to systems, because backups, unstructured data, and log files are important to any system. If you’re designing a system with large datasets, object storage would work well for your organization.

Some Important Terminologies To Learn

Redundant Disk Arrays (RAID) :

RAID, short for Redundant Array of Independent Disk refers to a technology in computer storage that is used to implement the fault tolerance feature of computer storage media (Hard disk) by using redundancy (stacking) data, either by using software, or device unit hard RAID apart. This technology divides or replicates the data onto multiple separate hard drives. RAID is designed to improve data constraints and / or improve I / O performance of hard disks.
There are several key concepts in RAID :
1. Mirroring (copying data to more than one hard disk)
2. Striping and also error correction, where data redundancy is stored to allow errors and problems to be detected and possibly corrected (more commonly referred to as fault tolerance Techniques).

Message Queue :

A message queue is a queue that routes messages from a source to a destination, or from the sender to the receiver. It follows the FIFO (first in first out) policy. The message that is sent first is delivered first. Message queues facilitate asynchronous behavior, which allows modules to communicate with each other in the background without hindering primary tasks. They also facilitate cross-module communication and provide temporary storage for messages until they are processed and consumed by the consumer.

Let’s take an example of how Youtube handles video uploading so fluently.

When a user uploads a video to YouTube, the website stores the video in a temporary storage location and adds a message to a message queue indicating that a new video has been uploaded.
The message queue routes the message to a separate component or set of components responsible for processing the video. This could include tasks such as transcoding the video into different formats, generating thumbnail images, and extracting metadata from the video.
As the processing tasks are completed, the component updates the status of the video and adds additional messages to the queue indicating the next steps that need to be taken. For example, a message might be added to the queue to indicate that the video is ready to be made public, or that it needs to be reviewed by a moderator before it can be published.
The message queue routes the messages to the appropriate components, which handle the tasks and update the status of the video as necessary.

This approach allows YouTube to handle the video upload and processing process asynchronously, which can improve the performance and scalability of the system. It also allows the different components of the system to be developed and updated independently, which can improve the maintainability of the system.

Kafka:

Apache Kafka started in 2011 as a messaging system for LinkedIn but has since grown to become a popular distributed event streaming platform. The platform is capable of handling trillions of records per day. Kafka is a distributed system consisting of servers and clients that communicate through a TCP network protocol. The system allows us to read, write, store, and process events. Kafka is primarily used for building data pipelines and implementing streaming solutions.
While Kafka is a popular messaging queue option, there are other popular options as well.

Database

Now that our system is able to handle sufficient load, we need to take care of our database as well, as we can’t just run database in our server and scale them up. As doing so will leads to data inconsistency. For example.

User A connects to Server 1 and User B connects to Server 2. All the work that they do will updated on their respective servers. Now, let’s suppose network fluctuated for User A and it reconnected to another server let’s say Server 2 . Server 2 will not be able to serve the user as all its data was stored on Server 1. To solve this issue, we need to keep database outside.

Keeping the database outside our server will help us to scale our database.

Replication

Database replication is the process of copying data from one database to another, in order to maintain an up-to-date copy of the data on multiple servers.
Database replication is often used to improve the performance and availability of a database system by distributing the workload across multiple servers and providing a backup copy of the data in case of a server failure. It can also be used to ensure data consistency between different locations or to allow users to access the data from multiple locations.

There are several different types of database replication, including master-slave replication, peer-to-peer replication, and multi-master replication.

In master-slave replication, one database server (the master) is responsible for writing data, while one or more other servers (the slaves) receive copies of the data and serve read requests. This is a simple and efficient way to set up database replication, but it does not allow the slaves to write data
In peer-to-peer replication, all servers can both read and write data and are responsible for replicating data to other servers.This allows for more flexibility and can improve performance for write-heavy workloads, but it can also be more complex to set up and manage.
In multi-master replication, multiple servers can write data and are responsible for replicating data to each other.
This allows for even more flexibility and can improve performance for write-heavy workloads, but it can also be more complex to set up and manage, and requires careful conflict resolution to ensure data consistency.

In master-slave replications, master database generally only supports write operations. A slave database gets copies of the data from the master database and only supports read operations. All the data-modifying commands like insert, delete, or update must be sent to the master database.

Generally, all the writes are done by the master. What if master goes down?

If master database goes down, other slave can become a master and server the purpose. There is a practice of voting that happens between the slave to become a master.

Database Sharding

Sharding plays an important role in large scale architectures. Database sharding is a database design technique that involves splitting a large database into smaller, more manageable pieces called shards. Each shard is stored on a separate server.

Sharding make use of Shard key to distribute data across shards.

The main advantage of database sharding is that it allows a database system to scale horizontally, meaning that it can handle a larger volume of data and requests by adding more servers to the database cluster. This can improve the performance and availability of the database system, especially for high-traffic applications or data sets that are too large to fit on a single server.

Routing database request based on shared key to specific shard

Let’s understand sharding with an example:

Imagine that you have a large database that stores customer information for an e-commerce website. The database is growing rapidly and is starting to become slow and difficult to manage. To improve the performance and scalability of the database, you decide to implement database sharding.
With database sharding, you split the large database into smaller pieces called shards. Each shard is stored on a separate server, and the database system distributes the data across the shards based on a sharding key. For example, you might use the customer’s location as the sharding key, so that all the customers from a particular region are stored on the same shard.
Now, when a user accesses the database to retrieve or update customer information, the database system uses the sharding key to determine which shard the data is stored on. It then sends the request to the appropriate shard, which processes the request and returns the data to the user.

WHICH DATABASE TO CHOOSE

Choosing a database relies on many factors such as:

Performance
Scalability
Availability

Relational Database Management System (RDBMS):

An RDBMS is a type of database management system that organizes data into structured tables with rows and columns. Each row represents a record, and each column represents a field or attribute of the record. RDBMS enforces a predefined schema to maintain data integrity and consistency.

Example: Let’s consider a simple example of an RDBMS for a library. You might have two tables: Books and Authors. The Books table could have columns like ISBN, Title, and AuthorID, and the Authors table could have columns like AuthorID, Name, and BirthYear. The AuthorID in the Books table would be a foreign key referencing the Authors table.

Use Cases: RDBMS are suitable for structured and highly normalized data, where relationships between entities need to be maintained.

Where are RDBMS used? Here are Some of the Examples :

E-commerce websites: Managing products, orders, customers, and inventory.
Financial systems: Tracking transactions, accounts, and balances.
Human resources systems: Storing employee information, departments, and roles.

Non-Relational Database (NoSQL):

NoSQL databases are designed to handle unstructured or semi-structured data. They offer flexible schemas and are horizontally scalable, allowing for easier distribution and handling of massive amounts of data.

Example: Consider a NoSQL database like MongoDB for a social media platform. In MongoDB, you might have a collection named Posts that stores posts by users. Each post can have varying fields, and you're not constrained to a fixed schema. Different posts might have different attributes based on user preferences.

NoSQL databases are appropriate for scenarios where:

Data is heterogeneous and doesn’t fit neatly into a tabular structure.
Scalability is crucial due to high data volume or traffic.
Rapid development and iteration are required.

Where are Nosql used? Here are some of the Examples:

Big data applications: Storing and analyzing large volumes of data in real-time.
Content management systems: Handling diverse content types and user-generated content.
Internet of Things (IoT): Collecting and managing data from various sensors and devices.

CACHE

Caching is a technique used to improve the performance of a system by storing frequently accessed data in a temporary storage area, called a cache, so that it can be quickly retrieved without the need to access the original data source.

Accessing data from disk is costly, so in order to improve latency we can use in memory stores to fetch data more quickly. As read from memory are much better than from disk.

Here are some stats of reading from memory and disk:

Reading from RAM: 10–100 nanoseconds
Reading from an SSD: 50–100 microseconds
Reading from a hard drive: 5–10 milliseconds

Some of the types of cache includes:

LRU CACHE : Keeps most recently used keys and removes least recently used (LRU) keys.
LFU CACHE: Keeps frequently used keys and removes least frequently used (LFU) keys.
VOLATILE-LRU: Removes least recently used keys with the expire field set to true.

Take Data From Database Server if Data Not present in Cache.

In order to improve response time and accessing data, we introduce a caching layer, For every read request, we check if data exist in cache or not, if yes then we fetch data from the cache and if not, we fetch the data from database , update cache with that data and return data to user.

Although, we should not use caching for caching all our data as it will put a huge load on the memory and will slow down the process. There are few considerations for using a cache:

Cache size: The size of the cache can impact its effectiveness. A larger cache can store more data and may be more effective at reducing the number of accesses to the underlying data source, but it can also consume more resources and may not be practical if the cache is too large.
Cache eviction policy: A cache eviction policy determines how the cache handles the replacement of old data with new data when the cache is full. Different policies have different trade-offs in terms of performance and efficiency. For example, a least recently used (LRU) policy will remove the least recently accessed data from the cache, while a first in, first out (FIFO) policy will remove the oldest data.
Cache expiration: Cache expiration determines how long data is kept in the cache before it is considered stale and is removed. Setting a shorter expiration time can improve the freshness of the data, but it may also increase the number of accesses to the underlying data source.
Cache consistency: Ensuring cache consistency is important to ensure that the data in the cache is accurate and up-to-date. This may involve invalidating the cache when the underlying data is updated, or using techniques such as cache invalidation or cache stampede protection to ensure that the cache is consistent with the underlying data.
Cache efficiency: The efficiency of the cache can impact the overall performance of the system. It is important to consider the overhead of maintaining and accessing the cache, and to ensure that the benefits of the cache outweigh the costs.
Cache usage patterns: Understanding the usage patterns of the cache can help to optimize its configuration and improve its effectiveness. This may involve analyzing the hit rate (the percentage of cache hits vs. misses) and the cache miss rate, and identifying the most frequently accessed data to optimize the cache layout and eviction policy.

CDN

A content delivery network (CDN) is a system of distributed servers that are used to deliver web content to users based on their geographic location. The goal of a CDN is to improve the performance and availability of web content by reducing the distance that the content has to travel and by providing multiple copies of the content that can be accessed from different locations.

CDNs are used to deliver a wide range of web content, including static assets (such as images and videos), dynamic content (such as HTML pages and APIs), and streaming content (such as audio and video streams).

They are commonly used by websites and applications that need to deliver high-bandwidth content to a large number of users around the world.

Consider this things when you use a CDN :

Cache Management: The cache expiration period dictates how long content remains in the CDN cache before requiring a refresh from the origin server. Opting for a shorter expiration improves content freshness but can augment origin server load and CDN expenses.

Geographical Reach: Enhance your application’s performance by minimizing the distance data needs to travel to reach users. Prioritize a CDN with an extensive geographical presence, tailored to your user base’s locations.

Cost Efficiency: CDN services might prove costly, particularly for high-traffic sites. Perform a thorough cost-benefit analysis to ensure that the advantages of using a CDN outweigh the associated expenses.

Origin Server Load: An efficient CDN reduces load on your origin server by serving cached content directly to users. Evaluate how effectively your CDN handles the load and ensures your origin server’s stability.

Content Purging Mechanisms: Consider how you can purge or invalidate cached content when updates are made on the origin server. Fast and reliable content purging mechanisms are crucial to ensure users receive the latest content.

Example: Imagine you have an e-commerce website that sells products. You’ve integrated a CDN to deliver product images quickly to users. Let’s say you just updated the image of a popular product to reflect a new color option. However, due to the caching mechanism of the CDN, users who have previously accessed the product page might still see the old image for a while, even if they refresh the page.

Here’s where content purging comes into play:

Manual Purging: You, as the website administrator, can manually initiate content purging for the specific product image. This means you’re instructing the CDN to remove the cached image and fetch the latest version from the origin server.
Automated Purging: Some CDNs offer mechanisms for automated purging. For instance, you might have set up a rule that triggers content purging whenever a product’s information is updated in your database. This ensures that when you change the product image in your database, the CDN automatically purges the cache associated with that image.

So, when a user visits the product page after the content has been purged, the CDN will retrieve the updated image from the origin server and deliver it to the user, ensuring they see the latest product image.

Read System Design Patterns From Here:

Will upload this Soon….