Introduction to Database Partitioning/Sharding: NoSQL and SQL databases

Sandeep Verma
4 min readOct 2, 2019

--

Breaking large datasets into smaller ones and distributing datasets and query loads on those datasets are requisites to high scalability

If a dataset becomes very large for a single node or when high throughput is required, a single node cannot suffice. We need to partition/shard such datasets into smaller chunks and then each partition can act as a database on its own. Thus, a large dataset can be spread across many smaller partitions/shards and each can independently execute queries or run some programs. This way large executions can be parallelized across nodes(Partitions/Shards)

Partition strategy for a key-value data model

The purpose behind partitioning is to spread data so that execution can be distributed across nodes. Along with partitioning, if we can ensure that every partition node takes a fair share, then at least in theory, 5 nodes should be able to handle 5 times as much data and 5 times as much read and write throughput of a single partition.

If sharding is unfair, then a single node might be taking all the load and other nodes might sit idle. This defeats the purpose of sharding/partitioning. A good partition strategy should avoid Hot spots.

There are two broad ways by which we partition/shard data :

Partition by key-range

Partition by key-range divides partitions based on certain ranges. For example, dividing an Organization based on the first letter of their name. So A-C, D-F, G-K, L-P, Q-Z is one of the ways by which whole Organization data can be partitioned in 5 parts. Now we know the boundaries of such partition so we can directly query a particular partition if we know the name of an employee.

Ranges are not necessarily evenly spaced. As in the above example, partition Q-Z has 10 letters of the alphabet but partition A-C has only three. The reason behind such division is to allocate the same amount of data to different ranges.

Due to the fact that most people have the name starting from letters between A-C, as compared to Q-Z, so this strategy will result in near equal distribution of data across the partition(nevertheless this is debatable). Albeit boundaries must be chosen by the administrator to ensure equal distribution.

HBase, BigTable, RethinkDB, and the earlier version of MongoDB exercise such a partitioning strategy.

One of the biggest benefits of such partitioning is the range queries. Suppose I need to find all people whose name starts from letters between R-S, then I only need to send the query to partition R-S.

Another example where key partitioning can be employed is when we have an access pattern based on timestamp. Consider an e-commerce application in which transaction data should be captured and monitored for auditing purposes. If we use the key as a timestamp (day-date-hour-min-second), then we can read all data within a day or for each hour as everything will lie in a single partition(if the partition is based on day or hour)

However, the downside of the key range partitioning can lead to hotspots. In the above example what if in future all employees are having names starting from letter A-C and not from Q-Z?

Similarly, for the date partitioning, all data for a particular day will be going to a single partitioning node(if we have one partition per day).

To dodge this issue, we could create a key with prefix as partition name to each timestamp(Something like “partA_15600600606” where ‘partA’ is a partition and ‘15600600606’ is timestamp). This way application can distribute data to different partitions. So now load will be distributed across multiple partitions, thus hotspots can be avoided. However, now the application has to send separate queries to each partition in order to fetch data of some window.

Partition by key hash

Key range partition strategy is quite prone to hot spots hence many distributed datastore employs another partitioning strategy. The hash value of the data’s key is used to find out the partition. A good hash function can distribute data uniformly across multiple partitions.

Cassandra, MongoDB, and Voldemort are databases employing a key hash-based strategy

Hash functions exercised for the purpose of partitioning should be cryptographically strong. Java’s Object.hashCode() is usually avoided for this purpose as such implementation can lead to a large number of hash collisions.

Each partition is then assigned a range of key hashes (Rather than range of keys) and all keys which fall within the parameter of partitions range will be stored on that partition. Partition ranges can be chosen to be evenly spaced or can be chosen from a strategy like consistent hashing

Sadly with the hash of the key, we lose a nice property of key-range partitioning: efficient querying data with some ranges. As keys are now scattered to different partitions instead of being adjacent to a single partition.

Hashing on key indeed reduces hot spots, however, it doesn’t eliminate it completely. In case read and writes are for the same key, all requests still end up on the same partition.

For example, on Instagram, a celebrity can have millions of followers. If this celebrity posts something on his/her account, and if this post is stored using a hash key based partitioning strategy(User id of celebrity), then there could be millions of writes (view count update/comments etc) or reads(read query for each follower)coming from his millions of followers.

There are many ways by which an application can handle such events. For each writes, it can add two digits random number before the key for each write(So as to distribute the load to 100 partitions), so that writes are distributed to different partitions(For the same key corresponding to celebrity).

However, now the application has to do the additional work while reading those writes as the query will now spawn to 100 such partitions. Many data systems employ such strategies for hotkeys to compensate for skewed workloads.

Don’t forget to clap if you enjoyed reading this article!

--

--