Database Sharding Pattern for Scaling Microservices Database Architecture

--

In this article, we are going to talk about Dabase Design Patterns of Microservices architecture which is The Database Sharding Pattern. As you know that we learned practices and patterns and add them into our design toolbox. And we will use these pattern and practices when designing e-commerce microservice architecture.

By the end of the article, you will learn where and when to apply Database Sharding Pattern into Microservices Database Architecture with designing e-commerce application system.

Step by Step Design Architectures w/ Course

I have just published a new course — Design Microservices Architecture with Patterns & Principles.

In this course, we’re going to learn how to Design Microservices Architecture with using Design Patterns, Principles and the Best Practices. We will start with designing Monolithic to Event-Driven Microservices step by step and together using the right architecture design patterns and techniques.

Database Sharding Pattern

we are going to talk about Database Sharding Pattern. So we will add Database Sharding Pattern into our toolbox that we can use when scaling our microservices architecture.

Let me start with what is sharding ?
Sharding is “a small piece or part”, meaning to be a small piece or part of something. From this point of view, database sharding is the separation of the data into unique small pieces of database. So this small pieces is called shards or chunks.

So in order to improve scalability when storing and accessing large volumes of data in microservices architectures, we should divide a databases into a set of horizontal partitions or shards. Each shard has the same schema, but holds its own distinct subset of the data.

So with this horizontal partitioning, shardings enable to scale the system with adding new shards according to storage needs. It can reduce contention and improve performance by balancing the workload across shards. Also with sharding, we can gain cloud power with locating physical servers closely which users can access the data performantly.

Databases dividing into shards with partition keys. These partition keys provide to decide which data should be placed in each shard. The partition key should be static. It shouldn’t affect by the data changes.

Tinder — Database Sharding Pattern

If we give an example, Tinder is very good example of database sharding pattern. Tinder is one of the most popular apps in the world for those who want to meet new people. It allows you to match and meet other people who use the application near you (around 160km) based on location.

The main requirements in this application is to find people near you very quickly and offer choices that meet the criteria what you set. So how tinder match peoples who are near to each other ?

Of course, Tinder segments users based on their location. This is called GeoSharding, that is, location-based database sharding.

You might think that you are sharding by dividing the world map into boxes with their locations and matches them in only into that box locations in the world. You can see example geo sharding of the world in the image.

As you can see that, We learned what partitioning and sharding are and how they can be used. So now lets examine Cassandra no-sql databases which is automatically includes database sharding and scaling features.

Cassandra No-Sql Database — Peer-to-Peer Distributed Wide Column Database

Cassandra is a distributed database from Apache Foundation. It is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure.

So, Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many different located servers, providing high availability with no single point of failure. It is a type of NoSQL database.

Apache Cassandra features;

Elastic scalability — It is scalable, fault-tolerant, and consistent.
Cassandra has no single point of failure
It is a column-oriented database.
Flexible data storage
Easy data distribution — Cassandra provides the flexibility to distribute data

Cassandra Architecture

Cassandra has peer-to-peer distributed system across the nodes in a cluster. Cassandra has a Master-Master architecture in terms of its structure. So every Cassandra Node has the same role and works as a Master. We can also call this as a Master-Less architecture.

Why Cassandra

Cassandra has an auto-sharding feature like many NoSQL databases. Data Sharding helps keep data divided among nodes. In this way, both the disk occupancy of the servers are optimized and the workload between the nodes is easily distributed.

Partition Key

All of the columns that will decide on which node the data should be kept are called “Partition Key”.

In the image, the Partition Key is set to Sensor# and Date. The data will be stored in nodes determined by inserting the value of these two columns with hash function. This will becomes Partition Keys.

As you will remember from the CAP Theorem, either Availability or Consistency must be selected in a distributed system. At this point, Most of microservice architectures decided that Availability was more important to them and sacrifice for strict consistent and embrace to Eventual Consistency would be sufficient.

As you can see that, we understand Cassandra architecture and understand that why its used widely in microservices architectures.

Design the Architecture — Database Sharding Pattern with Cassandra

we are going to design the architecture step by step. Iterate the arch design one by one as per requirements. If we apply these Database Sharding pattern into our e-commerce microservices architecture, you can find the below design of that.

So this will illustrates scaling databases different servers with using database sharding pattern.

Main features are;

  • Peer-to-Peer Distributed Wide Column Database
  • Master-Master (Master-less) architecture
  • It is highly scalable and designed to manage very large amounts of structured data.
  • It provides high availability with no single point of failure.

Lets choose the databases. Now, of course Cassandra is the best option for our database as per high scalability and high availability.

Elastic scalability — It is scalable, fault-tolerant, and consistent.
Easy data distribution — Cassandra provides the flexibility to distribute data

And of course according to CAP Theorem, we sacrificed to strict consistency in here, and its good to follow eventually consistent in our e-commerce domain with microservices architecture.

As you can see that we have understand database architectures on microservices. What’s Next ? we will see how to manage querying different microservices.

So we should evolve our architecture with applying other Microservices Data Patterns in order to accommodate business adaptations faster time-to-market and handle larger requests.

Step by Step Design Architectures w/ Course

I have just published a new course — Design Microservices Architecture with Patterns & Principles.

In this course, we’re going to learn how to Design Microservices Architecture with using Design Patterns, Principles and the Best Practices. We will start with designing Monolithic to Event-Driven Microservices step by step and together using the right architecture design patterns and techniques.

--

--

Mehmet Ozkaya
Design Microservices Architecture with Patterns & Principles

Software Architect | Udemy Instructor | AWS Community Builder | Cloud-Native and Serverless Event-driven Microservices https://github.com/mehmetozkaya