Azure Cosmos DB RU’s Consumption

Mohammad Shaved
5 min readOct 9, 2020

All the operations we perform on Azure Cosmos DB, we spend something called Request Units (RU’s). whether you’re writing an item to Cosmos DB, or reading an item from Cosmos DB, you’re spending RU’s and you will be charged as per the RU consumption. So it is very important to understand how does RU’s consumption being calculated for each operations which we perform on Azure Cosmos DB. Number of RU’s exhausted is directly proportional to CPU utilisation and size of data retrieved. Let’s compare few operations on azure cosmos DB and their cost.

By default azure Cosmos DB create index on each field in database, so if you query a collection with any field name the response is very fast and number of RU’s exhausted are very less , but if you try to do contains search in Cosmos DB then it does not use indexing and scan all the documents in collection hence number of RU’s exhausted are higher because it has to scan all the documents.

Let’s see few examples and try to understand how RU’s calculation work in cosmos DB. I am using mongo API to query documents in azure cosmos DB.

First I will show, how exact match search work in Azure cosmos DB. Here I am querying a document with a field name.

db.getCollection('employee').find({"name":"test"})db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "OP_QUERY",
"RequestCharge" : 5.98,
"RequestDurationInMilliSeconds" : NumberLong(8)
}

If you see in above example , I am directly querying document using exact match hence it is very fast and number of RU’s exhausted are very less because here azure uses indexing and quickly retrieve the document therefore CPU utilisation is very less.

Now let’s assume in my db I have 50000 documents which contains “test” in a name field, now what if I change the above query to do a contain search and try to get first 10 documents where name contains “test”, let’s see in below example.

db.getCollection('employee').find({ "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] }).limit(10)db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "OP_QUERY",
"RequestCharge" : 49.97,
"RequestDurationInMilliSeconds" : NumberLong(138)
}

So if you see number of RU’s exhausted are increasing as it has started scanning the documents in collection and after scanning few documents it found 10 documents which are matching with the criteria and then it gave the result. So here it does not traverse all the documents in collection but what if I try to get the last 10 records which match the criteria lets see an example below

db.getCollection('employee').find({ "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] }) .skip(49990).limit(10)db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "OP_QUERY",
"RequestCharge" : 3843.22,
"RequestDurationInMilliSeconds" : NumberLong(33282)
}

Now if you see in above example query has to scan all the documents, hence the number RU’s exhausted are very high.

Now lets analyse few other operations to get the count of documents. Here I am counting the number of records by doing exact match, then in that case number RU’s exhausted will be very less because it will use indexing to find the number of documents.

db.getCollection('employee').count({"name":"abc"})db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "count",
"RequestCharge" : 10.32,
"RequestDurationInMilliSeconds" : NumberLong(8)
}

Let’s see how to get the count of documents with contains search in cosmos DB. To get the count of documents, we can use count or aggregate methods. let’s see which method we should use in which scenario.

If I want to get a count of record where “name” field contains some value then we can use below queries.

1. Search query using aggregate method

db.getCollection('employee').aggregate([{"$match" : { "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] }}, {"$count": "totalRecords"}])db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "aggregate",
"RequestCharge" : 6782.96,
"RequestDurationInMilliSeconds" : NumberLong(8160)
}

2. Search query using count method

db.getCollection('employee').count({ "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] })db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "count",
"RequestCharge" : 3741.31,
"RequestDurationInMilliSeconds" : NumberLong(10607)
}

So if you see here number RU’s exhausted in aggregate method are way higher than count method. Hence for a simple search we should always use count method and aggregate method should be used for complex operations where we have to do some grouping operations.

But if you see the number of RU’s exhausted here are still higher in case of count method as well because it scan all the documents in collection to get the desired count.

Now let’s try to analyse how sorting work in azure cosmos DB and what are the RU’s consumption in case of ascending and descending sort, by default azure stores documents in ascending order on each index so when we do a contain search and try to sort in ascending order then it does not exhaust much RU’s for sorting, lets see an example below.

db.getCollection('employee').find({ "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] }).sort({"name" : 1}).limit(10)db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "OP_QUERY",
"RequestCharge" : 23.56,
"RequestDurationInMilliSeconds" : NumberLong(2715)
}

If you see in above example I have sorted on a “name” field in ascending order, “1” means ascending here. Hence number of RU’s exhausted are very less. But If I try to sort in descending order then it has to scan all the documents again to produce the result in descending order, example below.

db.getCollection('employee').find({ "$or" : [{ "name" : { "$regex" : ".*test.*", "$options" : "i" } }] }).sort({"name" : -1}).limit(10)db.runCommand({"getLastRequestStatistics": 1})
{
"_t" : "GetRequestStatisticsResponse",
"ok" : 1,
"CommandName" : "OP_QUERY",
"RequestCharge" : 3086.16,
"RequestDurationInMilliSeconds" : NumberLong(2715)
}

So if you see in above example I have passed “-1” in sort function which means descending order and you can see the number of RU’s exhausted in this case are very high again.

So by now we have understand that by default azure cosmos DB creates index on each field and if we try to query by doing a exact match then it works very smoothly and number of RU’s exhausted are very less but if we do a contains search in any operation then in that case azure cosmos DB does not use indexing and it has to scan all the documents resulting high number of RU’s exhausted based on how many documents are present in your collection. If you have million number of records then in that case RU’s exhausted will be too high and the response time will be very high. Even Microsoft never recommends to do contains search on azure cosmos DB. So the solution to this problem is azure cognitive search where you can do a contains search with very fast response.

--

--