System design part 4: Relational Database Management System

Published in

InterviewNoodle

7 min readJan 17, 2022

ref: https://s7280.pcdn.co/wp-content/uploads/2016/06/database-blue.png

A database is an organized collection of data typically stored electronically on a computer. The database is usually accessed by “database management system (DBMS)” (eg: MySQL) that allows end user to access/modify one or more databases.

In the current and upcoming series, we are going to discuss different types of databases(for eg: RDBMS, NoSQL), advantages, disadvantages and their properties.

In the current blog, we are going to cover Relational Database Management System(RDBMS) and techniques to scale a relational database.

Introduction to Relational Database

ref: https://resagratia.com/2020/08/relational-database-management-system-sql/

Relational database is a type of database that organizes data in tables, and a relationship can be established among tables in database. RDBMS stores data in a row-based table structure connecting related data elements. Structured Query Language (SQL) is used to fetch desired data from database.

RDBMS can store more data and can handle more simultaneous requests compared to DBMS.

Properties of RDBMS

ref: https://www.bmc.com/blogs/acid-atomic-consistent-isolated-durable/

Data inserted in RDBMS needs to follow ACID(described below) to maintain consistency throughout database. A sequence of database operations that satisfy database operations is called a transaction.

Atomicity: A transaction consists of multiple database operations. Atomicity makes sure that each transaction is considered as a single unit, which either completely succeeds or completely fails. If any of the operation fails, the entire transaction fails and database is unchanged. Atomicity is guaranteed even in cases of power failures, system crash and errors. For ex: for money to be transferred from account A to account B, first money needs to be withdrawn from A and then add money to B. Performing these operations in atomic system guarantees that database is consistent and money is neither debited not credited, incase any operation fails.
Consistency: Consistency ensures that a transaction can change database from one valid state to another and all the data written in database is according to defined rules.
Isolation: Transactions are frequently executed concurrently. Isolation ensures that concurrent execution of transactions leaves database in the same state as if all the transactions were executed subsequently. The main goal of isolation is to get concurrency control.
Durability: Durability ensures that if a transaction is committed, it remains committed even in system failure. This is achieved by storing all transactions in non-volatile memory.

Scaling up database:

With millions of users trying to access database wanting to receive results in milliseconds, we need to scale up databases to serve millions of user requests simultaneously without performance degradation.

To scale up databases, we could either add more resources (RAM, memory) to current database(Vertical scaling) or add more databases to existing system(Horizontal scaling). There’s a limit to how much we can scale vertically due to cost limitations, so developers tend to go with horizontal scaling at a certain point when handling millions of users.

Techniques to scale up a relational database:

(i) Master-Slave replication:

System consisting of one Master and two Slaves

Master-slave replication allows data to be replicated from one database(master) to more other databases(slave). The master is responsible for logging the updates and slaves go through them to replicate the data. To improve the performance, all the writes and updates are done by the master, while all the reads are distributed among slaves. This model improves the performance of system by dedicating a master to writes, simultaneously increasing read speed by leveraging slaves. If master fails, system continues in read-only mode until one of the slaves is promoted to master to handle all the writes/ updates or a new master is added to the system.

Advantages:

The read load is spread among multiple slaves, thus improving overall performance. Multiple slaves can be further added to improve performance.
Analytic applications can read from slave without affecting master
Backup of the database is available readily without affecting the performance of the master.

Disadvantages:

All writes are made by master only which could degrade performance in case of huge simultaneous writes.
Additional logic needs to be implemented to promote slave to a master.
Addition of another slave increases load on master since the binary logs have to read and copied to each slave.
There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.

(ii) Master-Master replication:

Both the masters perform read and write operation, but they need to coordinate on the write operation. If either master goes down, the system continues to operate with both write and read operation.

Advantages:

Write load is shared among master nodes
Application can read from both masters
Easy failover handling since if either master fails, the system will continue to do both write and read operations

Disadvantages:

Conflict for write operation needs to be handled
We need a load balancer to distribute writes or write application logic to determine who will write.
Most master-master system are loosely consistent, violating ACID or increased latency due to synchronization.
There is a potential for loss of data if the master fails before any newly written data can be replicated to other nodes.

(iii) Federation:

Splitting database into Order, Payments and Profiles

Federation or functional partitioning splits up the database based on the function. For example, instead of having one monolithic database, we could have three databases each dedicated to handling payments, order and profiles, resulting in less read and write traffic on each database, hence improving overall performance. With no single master performing writes, we can write in parallel, thus increasing throughput.

Advantages:

Faster read-write as we are using separate database for different function.
Decreased latency due to parallel read-write compared to serial read-write.

Disadvantages:

Federation isn’t effective if schema has huge number of functions(or tables).
Performing join operation using data from two databases is more complex.
Adds more hardware and software complexity
Needs to update application logic to determine which database is performing read-write for specific function.

(iv) Sharding:

Horizontal partitioning of database based on continents

Database sharding is the horizontal partition of data in a database or search engine. Sharding distributed data across different databases such that each database only has to manage a subset of data. For ex: partitioning data based on the user geographic location or user’s last name initial.

Similar to federation, sharding results in less traffic, less read-writes, less replication, less index size resulting in better performance and low latency, and more cache hits. Like federation, there is no centralized master to do writes, writes can be done parallel, thus improving throughput.

We can even use master-master or master-slave slave shards to avoid single point of failure.

Advantages:

High availability: If one database goes down, others continue to operate. However, it’s a good practice to do data replication to avoid potential loss of data.
Faster querying: Less traffic per database resulting in faster querying
Increasing write throughput: With no master serializing writes, writes can be done parallel which increase write throughput.

Disadvantages:

Uneven load: Consistent hashing is a technique that is used to spread large loads across multiple database. One shard could get more traffic compared to other shards, which could degrade performance of shards. Rebalancing brings in additional complexity and lot of downtime.
Increased complexity: Sharding adds more hardware and additional complexity
Less support: There are tons of resources out there regrading traditional RDBMS but there isn’t much support for sharding. If you are implementing sharding, you are on your own.

(v) Denormalization:

In most of the system, reads heavily outnumber writes by 100:1. A read resulting in complex databases join could take a significant amount of time. Denormalization attempts to improve performance of reads at the expense of writes performance. This is done by storing copies of data in multiple tables to avoid expensive joins. RDBMS like PostgreSQL and Oracle handle the task of storing redundant data and keep the data consistent.

Advantages:

When data is distributed using techniques like federation or sharding, performing join operation increases complexity. Denormalization bypasses the need of expensive join, hence resulting in low latency and faster performance.

Disadvantages:

Redundant data takes more storage space
In case of substantial writes, denormalized database could perform worse.
Complex database design to handle redundant data and keep it consistent

There isn’t any hard-n-fast rule on which technique you should use to scale up database, every technique has its own merits and demerits. Choose a technique that is suitable according to your business logic.

The topics that i would cover in the next blog would be NoSQL database.

Thank you for checking out my blog! Make sure to follow me to cover entire system design interview prep series.

System design part 4: Relational Database Management System

Scaling up database:

Written by SAKSHI CHHABRA