Mastering the Modern Database: An Introduction to Data Partitioning & Sharding — Part 2

Atindra Ghosh
16 min readDec 3, 2023

This article series introduces and explains the concepts of data partitioning and sharding. Part 2 is an overview of sharding and its related concepts.

Sharding: Diving into Distributed Data

After understanding the concept of database partitioning, it’s essential to delve deeper into another pivotal strategy known as “sharding.” While partitioning deals with segregating a database into manageable chunks, sharding takes this a step further by distributing these chunks across multiple servers or clusters. This horizontal distribution strategy is not just about managing data size but also about improving the performance, availability, and fault tolerance of large-scale database systems.

What is Sharding?

Sharding is a method of splitting and storing a single logical dataset in multiple databases. By doing so, each of the databases only manages a fraction of the data, making read and write operations faster and more efficient. Each piece of such a database is called a “shard.” These shards can be spread across multiple server instances, ensuring that the load (both in terms of data storage and query processing) is distributed.

In the example below, user data are distributed into shards based on their location. This is a simplified example of how sharding might be implemented in a real-world scenario, typically using more complex…

--

--

Atindra Ghosh

As a technology enthusiast with a diverse range of interests, I'm excited to be a part of this community.