5 Key Elements of Azure Cosmos DB

Chia Li Yun
Geek Culture
Published in
5 min readNov 4, 2021

Let’s get started with some introduction to Microsoft Azure’s fully managed NoSQL database — Azure Cosmos DB.

Photo by Scott Graham on Unsplash

Azure Cosmos DB has the following (but not exhaustive) benefits:

  1. Guaranteed speed at any scale
  2. Mission-critical ready (ensures 99.999% availability and enterprise-level security)
  3. Fully managed and cost-effective (automatic scaling and end-to-end database management)

You may read up more details and other benefits here.

What are the capacity mode available for Azure Cosmos DB?

Serverless
Consumption based where you are only charged the Request Units (RUs) consumed by your database operations and storage. This may be good for prototypes or small applications that you are not aware of usage yet.

Provisioned throughput
You may apply to either at the database level or container level. You will be billed the amount you provisioned even if you did not use it. This is recommended if your application has a stable high consumption requirement of the database.

Please note that the storage / rate limits may be different for the 2 different modes. I have highlighted some of the key ones below.

Still unsure about which mode to choose? Check out this example by Microsoft.

What is a Partition Key?

Azure Cosmos DB requires a way to “cut” our data for scaling purposes and it uses the partition key that you have chosen in this operation.

Partition key tells Azure Cosmos DB how it can group data together into a Logical Partition and hence all items in the same logical partition have the same partition key value. For example in a supermarket scenario, a container holds all type of items it sells. Each item has itemId and type of item (e.g. fruits, vegetables or drinks). If itemType was chosen as the partition key, all fruits item will be in a logical partition, all vegetables items will be in another logical partition and so on and so forth.

You will have to specify the partition key upfront during the provision stage of the container and no changes are allowed thereafter. Any changes to it will require data migration and that will be another topic to talk about. The choice of your partition key will affect the performance of your cosmos db. Hence, it is utmost important that you understand the different factors that you should consider for a partition key before you embark on your Azure Cosmos DB journey. Don’t worry, I will be sharing another article on the key factors one has to consider!

More about Logical Partitions

A logical partition consists of a set of items that have the same partition key. Each item has an id that has to be unique only within the logical partition. Using the same supermarket example, you can have an item with id = 1, itemType = fruits and another item with id = 1, itemType = vegetables.

The scope of database transaction lies within the same logical partition. Hence, if your application requires bulk operations or you might need to run stored procedures / triggers, you have to carefully consider that items that needs to be executed and try to group them together (it is one of the factors of consideration).

While there are no limit to the number of logical partitions in a container, each logical partition has a maximum storage size of 20GB (it is another consideration factor)..

More about Physical Partition

You may have heard about physical partition here and there but it may be very brief. Basically, physical partition is the actual physical storage of Azure Cosmos DB servers. A container is scaled by distributing data (logical partitions) and throughput across the physical partitions. Below are the 2 factors that will determine the number of physical partitions:

  1. Total data storage. A physical partition has a maximum storage of 50GB.
  2. Number of throughput provisioned. A physical partition has a maximum of 10,000 RU / second of throughput. Since logical partition is mapped to the physical partition, the logical partition has the same throughput limit as well.

Just like logical partition, there is no limit to the number of physical partition that a container can have. Azure Cosmos DB being a managed database, we do not need to worry about it as these are the internal implementation and controlled by Azure Cosmos DB itself. All we have to care is the partition key as it is being used to distribute the data and throughput. Poor decision of partition key can lead to hot partition — when most request are being redirected to a small subset of partition that result in inefficient use of provisioned throughput (if that is the plan you chose), rate-limiting and of course high cost.

Limitations

Apart from the standard maximum storage size for logical and physical partition that I have shared in the earlier sections, here are some of the other limitations that may different based on the consumption plan you have chosen.

Per-request limits

You may find out more about the other limits here.

Summary

Overview of Azure Cosmos DB Components (Shared throughput by container)

The diagram above depicts a summary of all the different components and how they are connected in Azure Cosmos DB. It demonstrates the mapping of logical partitions to physical partition in the case of shared throughput by container. However, if you set throughput on a database instead, the mapping might differ as the physical partition can be shared across different containers.

This article has covered Azure Cosmos DB at a very high level and I hope that these core information can get you started with your database design for your application. Thank you for reading and stay tuned for the next article about choice of partition key! If you like this article / find it helpful, would appreciate if you could give me a clap or like on the article to show some support and love.

--

--

Chia Li Yun
Geek Culture

Recent graduate from university. Always excited about the new technologies and love to share with the tech community here!