Azure Cosmos DB — when point read will cost more than the query?

Michał Smereczyński
3 min readMar 24, 2023

--

Azure Cosmos DB documentation mentions that

Read operations in Azure Cosmos DB are typically ordered from fastest/most efficient to slower/less efficient in terms of RU consumption as follows:

- Point reads (key/value lookup on a single item ID and partition key).
- Query with a filter clause within a single partition key.
- Query without an equality or range filter clause on any property.
- Query without filters.

The only factor affecting the RU charge of a point read (besides the consistency level used) is the size of the item retrieved. The following table shows the RU cost of point reads for items that are 1 KB and 100 KB in size.

RU cost of point reads for items that are 1 KB and 100 KB in size

Because point reads (key/value lookups on the item ID) are the most efficient kind of read, you should make sure your item ID has a meaningful value so you can fetch your items with a point read (instead of a query) when possible

The bad thing is that actually it does not work for large documents. Let’s test it with single, large document (I used some on-line JSON generator):

{
"isActive": true,
"items": [
{
"boardId": "147654876797",
"boardItems": [
"[Path,{applyMatrix:true,data:{id:14711106},segments:[[[-353.5,-205.5],[0,0],[0.66667,1]],[[-351.5,-202.5],[-0.38006,-1.14018],[0.31623,0.94868]],[[-351.5,-199.5],[0,-1],[0,3.66667]],[[-351.5,-188.5],[0.19272,-3.6616],[-0.54226,10.30286]],[[-355.5,-158.5],[1.76614,-10.30246],[-0.91684,5.34826]],[[-358.5,-142.5],[1.20258,-5.29134],[-0.46722,2.05576]],[[-360.5,-136.5],[0.5547,-2.0339],[-0.44721,1.63978]],[[-361.5,-131.5],[0.21082,-1.68655],[-0.16538,1.32304]],[[-361.5,-127.5],[0,-1.33333],[0,1]],[[-361.5,-124.5],[0,-1],[0,0.33333]],[[-361.5,-123.5],[-0.29814,-0.14907],[0.59628,0.29814]],[[-359.5,-123.5],[-0.66667,0],[2,0]],[[-353.5,-123.5],[-1.99099,-0.18962],[5.02153,0.47824]],[[-338.5,-121.5],[-5.03554,-0.29621],[11.98391,0.70494]],[[-302.5,-120.5],[-11.99261,-0.53698],[10.34445,0.46318]],[[-271.5,-118.5],[-10.33763,-0.5964],[6.9963,0.40363]],[[-250.5,-117.5],[-6.98878,-0.51769],[2.02205,0.14978]],[[-244.5,-116.5],[-2.01193,-0.25149],[0.66152,0.08269]],[[-242.5,-116.5],[-0.4714,0.4714],[0.4714,-0.4714]],[[-242.5,-118.5],[0.11926,0.65591],[-0.54975,-3.02361]],[[-244.5,-127.5],[0.74536,2.98142],[-0.92178,-3.68711]],[[-247.5,-138.5],[0.77002,3.72176],[-1.2324,-5.95661]],[[-250.5,-156.5],[0.92493,6.01203],[-0.40864,-2.65617]],[[-251.5,-164.5],[0.19147,2.68059],[-0.14249,-1.99492]],[[-251.5,-170.5],[0,2],[0,-1.66667]],[[-251.5,-175.5],[-0.15089,1.65982],[0.18357,-2.01926]],[[-250.5,-181.5],[-0.30831,2.00401],[0.3584,-2.32961]],[[-249.5,-188.5],[-0.19574,2.34888],[0.13841,-1.66091]],[[-249.5,-193.5],[0,1.66667],[0,-0.33333]],[[-249.5,-194.5],[0,0.33333],[0,-0.33333]],[[-249.5,-195.5],[0.2357,0.2357],[-0.2357,-0.2357]],[[-250.5,-195.5],[0.33333,0],[-0.33333,0]],[[-251.5,-195.5],[0.33168,-0.03317],[-3.00348,0.30035]],[[-260.5,-194.5],[3.01535,-0.13706],[-4.32886,0.19677]],[[-273.5,-194.5],[4.33333,0],[-6.33333,0]],[[-292.5,-194.5],[6.33333,0],[-4.66667,0]],[[-306.5,-194.5],[4.66226,-0.20271],[-3.01561,0.13111]],[[-315.5,-193.5],[3.01079,-0.21506],[-1.66243,0.11875]],[[-320.5,-193.5],[1.66667,0],[-2,0]],[[-326.5,-193.5],[2,0],[-8.33333,0]],[[-351.5,-193.5],[8.33333,0],[0,0]]],strokeColor:[0.26667,0.26667,0.26667],strokeWidth:4}]"
],
"undoedItems": [],
"viewPosition": "{ x: 0, y: -290 }",
"active": true
}
],
"currentBoardId": "147654876797"
}

Of course the test file has a lot more “items” — around 1.9MB.

The partition key is “currentBoardId”.

If we want to query Cosmos DB for the entire document in the correct way, we should have a filter clause within a single partition key and point to unique document (by its ID). So the query should look like this:

QUERY = 'SELECT * FROM boards WHERE boards.currentBoardId = "147654876797" AND boards.id = "5b79ab31-5a63-422f-97aa-922dbcbba1ea"'

results = container.query_items(
query=QUERY, enable_cross_partition_query=True, populate_query_metrics=True
)

Having results from above, we can extract the request charge for this query:

request_charge = container.client_connection.last_response_headers["x-ms-request-charge"]

print(request_charge)

The request charge for this query is: 35.2

Believing in what documentation say, the point read for the same document should be less expensive than the query above. Let’s try.

existing_item = container.read_item(
item="5b79ab31-5a63-422f-97aa-922dbcbba1ea",
partition_key="147654876797"
)

request_charge = container.client_connection.last_response_headers["x-ms-request-charge"]

print(request_charge)

The request charge for point read is: 291.8 (8 times more expensive).

The problem is known to Microsoft. I have an answer for this case from Azure Support (30d ago):

Apologizing for the delay as I was working with product team in your issue and they told that there is discrepancy under current charging policy ,where point read will cost more than the query specially for large document and they have also told that there is some regression in charging policy which is going to be corrected soon.

So, if you are working with large documents in Cosmos DB, you probably should do a cost assessment for point read charges.

--

--