Zero To Million Users (Part 4) : Database Replication

Published in

Level Up Coding

6 min readApr 16, 2023

Now that load is balanced(part 3), what about data loss protection? Let’s dive into a more fun and thought-provoking part of the system now, database replication.

Table Of Content

Story of a Database Replication Hero
How Does “Database Replication” Look In Practice
Final System After Integrating Data Replication
Interview Questions To Understand “Database Replication” better

Story of a Database Replication Hero

In a land of magical creatures, there was a busy library that housed all the world’s knowledge. The library was managed by a wise old owl named Oliver. One day, a mischievous fairy accidentally cast a spell that caused the library to lose all its books!

Oliver, realizing the need for a backup, decided to implement database replication. He created three identical copies of the library in different parts of the world, with the same books and information in each copy. The copies were magically connected, so any changes made in one copy would be automatically reflected in the others.

This proved to be a wise decision, as a mischievous gnome later snuck into the library and stole some books. But thanks to database replication, Oliver was able to quickly recover the lost books from the other copies, ensuring that the library remained complete and accessible to all its visitors.

The library became even more famous, attracting curious creatures from far and wide who marveled at the magic of database replication. Oliver, the wise owl, was celebrated for his foresight in implementing such a reliable system. And the library continued to thrive, safeguarding knowledge for generations to come, all thanks to the power of database replication.

How Does “Database Replication” Look In Practice

Database replication is a process in which data from a primary database, also known as the master, is copied to one or more secondary databases, known as slaves, in near real-time. This allows for redundancy and high availability, as well as the ability to offload read operations from the master database, improving performance and scalability.

In a master-slave replication scenario:

The master database is responsible for handling write, update, delete operations. When a write operation is performed on the master, the changes are recorded in its transaction log or replication log.
The slave databases then periodically fetch the changes from the replication log and apply them to their own copies of the data, bringing them up-to-date with the master. This way the slave databases replicate the changes from the master and handle read operations.

Here’s an example of how database replication might work in practice in a master-slave setup:

A user submits a new order on an e-commerce website, and the request is sent to the master database.
The master database processes the request, updates the order information in its own database, and records the changes in its replication log.
The slave databases, which are constantly monitoring the replication log, fetch the changes from the master and apply them to their own databases, making them consistent with the master.
When a user requests to view their order details, the request is directed to one of the slave databases.
The slave database, which has the updated order information from the master, retrieves the data and sends it back to the user, offloading the read operation from the master and improving performance.

Here’s a minimal example of how you might implement a simple master-slave replication using Python and Redis, a popular in-memory data store:

// Master

import redis

# Connect to Redis as master
r = redis.StrictRedis(host='localhost', port=6379, db=0)

# Set a key-value pair in Redis
r.set('key', 'value')

// Slave

import redis

# Connect to Redis as slave
r = redis.StrictRedis(host='localhost', port=6379, db=0, decode_responses=True)

# Continuously poll for changes from master
while True:
    # Fetch changes from replication log (in this case, fetching all keys)
    changes = r.keys()
    
    # Apply changes to local slave database
    for key in changes:
        value = r.get(key)
        # You can perform further processing or updates here based on the changes
        
    # Sleep for a while before polling again
    time.sleep(1)

Final System After Integrating Data Replication

Interview Questions To Understand “Database Replication” better

1. What is the difference between master-slave and master-master replication in databases?

Answer: In master-slave replication, there is one primary database (master) that accepts write operations, and one or more secondary databases (slaves) that replicate changes from the master for read operations. In master-master replication, multiple databases act as both master and slave, allowing for read and write operations on any of the databases. Master-master replication can provide better load balancing and fault tolerance, but requires additional complexity in conflict resolution.

2. What if the master fails in a master-slave architecture?

Answer: In that case, slave temporarily acts as a master. Promoting slave might be complicated because slave might not be up to date. Here are a few solutions: Multi Master (Changes made to any of the master nodes are replicated to other master nodes, ensuring data consistency across the system) , Circular Replication(Each node in the circular setup acts both as a master and a slave i.e. each node can accept and process both write and read operations. Changes made in one node are replicated to other nodes in the loop, creating a distributed and interconnected replication network.)

3. What are the factors to consider when choosing a replication strategy for a database?

Answer: Factors to consider when choosing a replication strategy include:

Data consistency requirements
Performance and scalability needs
Fault tolerance and high availability requirements
Network and infrastructure considerations
Complexity and management overhead
Conflict resolution mechanisms
Recovery and backup strategies

4. What are the common challenges in database replication and how can they be addressed?

Answer: Common challenges in database replication include:

Data consistency and synchronization issues
Replication lag and latency
Conflict resolution and handling concurrent updates
Network failures and connectivity issues
Load balancing and performance optimization
Monitoring and management of multiple copies

These challenges can be addressed through techniques such as implementing appropriate consistency models, optimizing network configurations, using conflict resolution algorithms, and implementing robust monitoring and management tools.

5. What is the role of a replication factor in distributed databases?

Answer: Replication factor refers to the number of copies of data that are maintained in a distributed database. It determines the level of fault tolerance and data availability. A higher replication factor provides better fault tolerance, as data can be retrieved from other copies in case of failures. However, it also increases storage and management overhead, as well as the complexity of handling concurrent updates and consistency.

6. What is the CAP theorem and how does it relate to database replication?

Answer: The CAP theorem (Consistency, Availability, and Partition tolerance) states that it is impossible for a distributed database system to simultaneously achieve all three properties. Database replication is closely related to the CAP theorem, as it involves trade-offs between consistency, availability, and partition tolerance. For example, synchronous replication may prioritize consistency and availability but may suffer from increased latency and potential partitioning issues, while asynchronous replication may prioritize availability but may sacrifice consistency.

7. How can you ensure data consistency in a distributed database with replication?

Answer: Data consistency in a distributed database with replication can be ensured through techniques such as:

Implementing a strong consistency model where all copies of the database are updated in the same order.
Using consensus algorithms like Paxos or Raft to agree on the order of updates across all copies.
Using distributed transaction protocols that enforce ACID properties across multiple copies.
Implementing conflict resolution mechanisms to handle concurrent updates and resolve conflicts.

8. What is conflict resolution in database replication and how does it work?

Answer: Conflict resolution in database replication refers to the process of resolving conflicts that may arise when updates are made to different copies of a database simultaneously. Conflicts can occur due to differences in timestamps, data values, or other factors. Conflict resolution mechanisms typically involve comparing conflicting updates and selecting a resolution strategy, such as using timestamps, prioritizing one update over the other, or merging conflicting updates.

That’s a wrap! Follow for more exciting articles coming up on this series.

Zero To Million Users (Part 4) : Database Replication

Story of a Database Replication Hero

How Does “Database Replication” Look In Practice

Final System After Integrating Data Replication

Interview Questions To Understand “Database Replication” better

Written by SDE Story Crafting