Introduction to Azure Cosmos DB

In this article, I will try to give a brief information about Azure Cosmos DB and introduce common concepts of Azure Cosmos DB.

Azure Cosmos DB is getting very popular and Entity Framework Core is going to support it for its 3.0 version, see https://docs.microsoft.com/en-us/ef/core/providers/#cosmos-db.

So, what is Azure Cosmos DB ? It is a planet-scale document database which is an evolution of Azure Document DB.

When you login to Azure Portal, you can easily create an Azure Cosmos DB database and collection(s) on that database. You can think collection as a table on a relational database without a schema.

Pricing

Before we start, let’s talk about pricing. Your usage is priced based on Request Units. You can think of RUs per second as the currency for throughput.

  • Write will cost more than read
  • Query on an indexed data costs less
  • Size of the data affects RUs and price as well
  • Latency option affects pricing (we will talk about it later)
  • Indexing policy also affects pricing

It will be better to calculate your needs in advance so that you will not see any surprising bill on your Azure account. You can use Microsoft’s capacity planner for Document DB (named to Cosmos DB) https://www.documentdb.com/capacityplanner.

Another good thing is, each request to Azure Cosmos DB returns used RUs to you so you can decide whether stop your requests for a while or increase the RU limit of your collection on Azure portal.

Multi API support

Azure Cosmos DB supports 5 type of APIs.

  • SQL API (Json)
  • MongoDB API (Bson)
  • Gremlin API (Graph)
  • Table API (Key-Value)
  • Cassandra API (columnar)

Although Azure Cosmos DB offers 5 different API models, all different data models are stored as ARS (Atom Record Sequence).

It also supports Stored Procedures, User Defined Functions and Triggers. Isn’t that great 😉

Horizontal partitioning

One of the powerful side of Azure Cosmos Db, it can partition your data automatically. In order Cosmos DB to do that, you need to define a partition key for your collection while you are creating your collection.

Let’s say that you have a Database named Travel and have a collection named Hotels. Here is a sample hotel document for the Hotels collection

{
“id”: “b318aeb0–4b0c-4ef0–8d4b-ddd10f502033”,
“name”: “Villa Borghese”,
“country”: “Italy”,
“city”: “Rome”
}

If you select city as the partition key for the Hotels collection, Cosmos DB will automatically partition your collection when your data grows.

It is crucial to select the correct partition key here. Cosmos DB will handle the rest.

In some cases, some of your partitions might receive much more requests (writes) than other partitions. Such partitions are called hot partitions. In that case, it is better to create another collection only for this specific partition and use another partition key.

For example, assume that city of Rome receives so much hotel creation requests. In that case, you can create another collection (Hotels_Rome) and use district as the partition key for this new collection.

For a Multi-Tenant app, TenantId can be used as a partition key for most of the collections.

Cross Partition Queries

Such a great benefit comes with a disadvantage of course. It is suggested not to run cross partition queries because it will be slow and it will cost more. For cross partition queries, Cosmos DB will run your query on more than one partition, merge them and return to you.

If you still need to run cross partition queries, you need to explicitly specify it in your FeedOptions:

var option = new FeedOptions { EnableCrossPartitionQuery = true };

Replication (Globally distributing data)

Azure Cosmos DB automatically replicates your data over the available data centers. But, why we should replicate the data ? Answer is simple, performance. If your app is closer to data source, it will retrieve the data faster and your users will have a better experience with your app.

Here is a screenshot of Azure Cosmos DB’s replication screen:

Here you can easily select where to replicate your data.

There are two regions on Azure Cosmos DB, write regions and read regions. As you can easily understand, data can be written to write regions and data can be read from both write and read regions.

You can also define fail-over regions, so Azure can fallback to next region when a query execution fails.

Consistency

Replication comes with a choice of consistency. So, when one instance of your app writes data to a write-region, Azure needs to replicate this data to other regions.

Azure Cosmos DB offers 5 type of consistency levels. It means, you need to select how Azure should replicate your data between your Azure Cosmos DB regions. Let’s see what are those consistency levels:

Strong

In this model, there are no dirty reads. It means, when a data is updated, everybody will read the old value until the data is replicated to all regions. This is the slowest option.

Bounded Staleness

In this option, you can define period of time or update count for the staleness of your data. You can say that, no dirty reads for 1 minute or no dirty reads for data updated more than 5 times. When you set the time option to 0, it will be exactly same as Strong consistency option.

Session

In this option, no dirty reads are possible for writers but dirty reads are possible for readers. This is the default option. So, if you are the one writing the data, you can read that data. But for others, they can read stale data for a while.

Consistent Prefix

In this option dirt-reads are possible but they are always on order. So, if a data is updated with the values 1,2,3 in order, readers always see the updated data in this order. No one will see the value 3 before 2.

Eventual

In this option, dirty reads are possible and there is no guarantee of order. So, if a data is updated with the values 1,2,3 in order, a reader can see value 3 before seeing value 2. But, this is the fastest option.

Here is a commonly used image for showing consistency options of Azure Cosmos DB:

Resource Model

Azure Cosmos DB adds additional fields to your documents. Here you can see a sample document model:

If you don’t set a value for id field, Azure Cosmos DB will automatically assign a GUID value.

Migrations

For Migrating your existing database to Azure Cosmos DB, there is an open-source tool which you can find on https://azure.microsoft.com/en-us/updates/documentdb-data-migration-tool/. This tool doesn’t offer much but it understands column names on your table or views on SQL Server and converts them to hierarchical data on Azure Cosmos DB.

Assume that we have a view like below:

SELECT Name, Country AS “Address.Country”, City AS “Address.City” FROM Hotels

Migrator tool can convert result of this query (your view) to something like below:

Azure Cosmos DB UI

For me, one of the most impressive thing about Azure Cosmos DB, it has a local emulator. You can install it on your computer and emulate Azure Cosmos DB on your computer.

You can download the emulator on https://aka.ms/cosmosdb-emulator. By using this emulator, you can even calculate the cost of your queries and make assumption about the cost of your Azure Cosmos DB usage.

Nuget Packages

Currently, Cosmos DB support for Entity Framework Core is no released. Until it is released, you can use the packages below to connect & use Azure Cosmos DB from your C# application.

For Full .NET Framework
https://www.nuget.org/packages/Microsoft.Azure.DocumentDB/

For .NET Core
https://www.nuget.org/packages/Microsoft.Azure.DocumentDB.Core/