Sriram Uppuluri
Walmart Global Tech Blog
5 min readJul 17, 2020

--

Deep Dive into Azure Cosmos PartitionKey and Partitions

How to Effectively Model Azure Cosmos

Do you want to build an application with these features?

  1. Highly scalable (read and write throughput)
  2. Maintainable data consistency
  3. Consistent business continuity during regional outages/disaster recovery

Then Cosmos DB is a perfect fit for your application. But you need a deep-dive understanding of how Cosmos works so you can model your application accordingly, and then you can achieve the desired throughput.

Partition Key Pre-requisites

A partition key is a unique dataset that has high cardinality, or a wide range of values possible values.

It is important to choose the right partition key during design. Once you select the partition key and ingest the data, it is not possible to change it unless you move data to a new container with a changed partition key. Typically for a heavy, read-intensive applications you can choose partition key based on the majority of the read calls pattern/filter.

For example, let’s say you’re building an application to host an organization’s user base. For that, you can use userId as the partition key. All the documents or data related to a user will be mapped to the userId partition key.

PartitionKey Example

What is a Synthetic Partition Key

Typically, a partition key can be a JSON (Java Script Object Notation) attribute within your document used to distribute you data across multiple partitions. However, scenarios may happen where you will need to model the partition key using two or more attributes in your document. This is called a Synthetic Partition key.

For example, if you’re modelling multi tenant application, it is recommended to use tenant-Id as part of the partition key and use a synthetic partition key. You can also have alternative usecases where two or more attributes derive the unique data.

PartitionKey versus Synthetic PartitionKey

Cosmos DB Partition

Now that we have a detailed understanding of the partition key and various scenarios to model, let’s have a deep dive on how Cosmos does partitioning.

Cosmos scales each individual container by storing data into partitions, and this approach lets you scale and perform your throughput. There are two types of Cosmos Partition

  1. Logical Partition
  2. Physical Partition

The following covers the significance and usage of each of these partitions.

Logical Partition

A Logical Partition contains a set of documents that belongs to same partition key. A logical partition can have a 20GB limit. Be sure to model your partition key so that you can manage all related documents using the same partition key. In the use case above (we used userId as the partitionKey), multiple documents related a single user can be grouped in a logical partition.

Logical Partition

Physical Partition

A Physical Partition is the actual storage of the document. To start, your container may have one physical partition created. Each partition can store up to 50GB of data and can provide throughput up to 10,000 request units per second (RU/s). One or more logical partitions map to one physical partition. Cosmos automatically creates a new physical partition as your data size grows beyond 50GB. The partition key’s values is hashed so that it can go to a particular physical partition. Splits to the existing physical partition may happen so that system can hash partition keys across all available physical partitions. Data belonging to the single logical partition will be mapped/stored on to new/same physical partition. Adding a new physical partition should not have an impact on your application’s availability.

Range of Partition Addresses Hashing
Logical versus Physical partition

Throughput considerations

Cosmos scales the container throughput across the physical partitions. A request unit (RU) is the measure of the throughput. You can provision RUs (required by your expected throughput) either at the database or container level. Provisioned RUs equally divide among physical partitions

How Database Throughput Provision Works

Throughput provisioned at the database shares across all of the containers created in the database. Think of it as a pool shared across all containers. Cosmos limits usage to 25 containers when you provision a database throughput. The minimum throughput required for a shared database is 1,000 RUs.

Once the database creates x number of physical partitions (based on the provisioned RUs) and containers are created, the system shares and allocates every container across x number of physical partitions at the database level.

For example., let’s say there are three physical partitions and five containers. All five containers will share these three partitions. Now, let’s say two containers end up growing and require more throughput. In that case, the system creates a new set of three physical partitions, and only these two containers will have additional partitions allocated. Now each container will be mapped to different physical partitions created at the database level. So, if one container needs full throughput, then it can use the provisioned RUs completely, and the other container may end up having rate limit/throttling.

Provisioning Database Throughput

How Container Throughput Provision Works

Throughput provisioned at the container level is dedicated or exclusively reserved for that container. This means you will always receive and be charged for the provisioned throughput if you’ve used it or not. The minimum throughput required for a container is 400 RUs. It’s important to use or have fewer RUs set and increase it only when you need to scale your throughput.

The container-provisioned throughput always distributes across the physical partitions. Let’s say you have five physical partitions and a setup throughput at around 10,000 ; in this case, each physical partition will get 2,500 RUs allocated. If you see more calls related to partition key data mapped to one partition, then you can expect rate limiting (if 2,500 RUs are consumed) even though you have enough RUs (10,000) allocated at container.

Note: Once you have a provisioned throughput, you cannot move between a shared database to a dedicated container throughput and vice versa. The only way to do that is to create a new container/database and move your data.

Provisioning Dedicated Container Throughput

Considerations when Choosing between Database versus Container Throughput

--

--