System Design — Buzzwords to Revise Before Going for an Interview — Part 3

Contesttrandder
InterviewNoodle
Published in
2 min readDec 8, 2021

--

Sharding

Separating/breaking larger databases into more easily accessible parts called shards.

Example of sharding

Advantages of sharding

  1. Increased Storage Capacity to hold the data.
  2. Querying a smaller database is faster( as lesser search space )
  3. Helps a lot in scaling imagine scaling the database across independent servers, each with its own CPU, memory, and disk.

Disadvantages of sharding

  1. Resharding the data ->If a single shard is no longer able to hold more data due to rapid growth of the database itself, then due to uneven data distribution, some smaller shards might reach exhaustion and the data would need to be rehashed to different databases which could mean,

a) Introduction of new shards into the system.

b) Finding out a new hashing function to avoid uneven distribution of data among shards.

2. Celebrity problem:

Also called hotspot key problem, Excessive access to a specific shard could cause a server overload. ( imagine shah rukh khan or justin beiber’s records are in a certain shard, then that specific database is queried again and again leading to slowing down the whole of the system). So we might need to allocate a new shard for each celebrity!

3. Join and Denormalization

Once a DB is sharded among smaller databased, it is harder in a relational database ( be aware of the word relational database I have used here) to perform join operations( Natural join, Outer join, etc) across the DB shards., so we perform an operation called denormalization so that queries can be performed in the given table only rather than on many smaller tables ( smaller tables are formed due to normalization ).

Few Conclusive points on databases

  1. Memory is faster to access and disk reads are slow ( so try to avoid reading from the disks as much as we can )
  2. Sharding should be done on an optimal column or a row based on the use case, and if sharding is done on some random column or row it would lead to abnormal “joins” across the shards, leading to abnormal delay in the system.
  3. Consistent hashing is a technique to avoid one of the disadvantages of sharding ( that is in this article, written as resharding the data). Would discuss consistent hashing in the upcoming articles soon.

Please do comment below your thoughts and do suggest any changes which you would suggest for my article and please feel free to share and follow me!

Peace! :)

--

--

I am a tech enthusiast, love reading about tech, love problem solving and the unboundedness in the stuff we can learn and implement is what excites me everyday.