AZ Lamps #22 — Python or .NET for Cosmos DB?

Artem Mikulich
AZ Lamps by Artem Mikulich
2 min readAug 5, 2023

--

I have already written several articles about Azure Cosmos DB, and I still find the database appealing for many scenarios. But recently, I got to look at it from an unusual angle.

The task was to migrate a decent amount of data (millions or records) from one Cosmos DB container to another, slightly modifying the documents on the way through. It sounded like a small console app on Python would solve it like a charm, so I decided to try it. Indeed, it took a little to write a PoC of 30 lines of code that did the trick on a small dataset (spoiler: a C# equivalent took around 60 lines).

The first time a “what a shame” kind of moment occurred was when I connected to the actual database container (instead of an emulator) — it turned out that containers with Hierarchical Partition Keys are not supported. To clarify, Hierarchical Partition Keys — is a new Cosmos DB feature that lets you distribute data more efficiently and improve performance at the end of the day. I’d hoped to play with it during the migration, but currently, it would have been possible with .NET SDK only.

I had to disable Hierarchical Partition Keys on a new container and go ahead. Nevertheless, the next trouble didn’t keep me waiting for long. If a SELECT statement returns a huge amount of rows (that was my case), Cosmos DB enriches the response with a continuation token. The token is essentially a link to the next page of results, so it allows you to read the whole dataset. Long story short, Python SDK does support the continuation token, but a query must be within a partition. So, for example, if your Partition Key is CompanyId, you can read data only within a company. I thought like, “ok, not a big deal, let’s grab a list of Partition Keys first.” So, all I needed was as simple as that:

SELECT c.PartitionKey FROM c GROUP BY c.PartitionKey

And you know what? It didn’t work either, as Python SDK doesn’t support GROUP BY. After that, I had to call off my idea and return to .NET SDK.

I mentioned earlier that Cosmos DB made a huge step over the last year with many new features (Hierarchical Partition Keys, Burst Capacity, Partition Merge, etc.). Obviously, Microsoft was laser-focused on .NET clients, and therefore Cosmos DB SDK for Python is now badly behind the streamlined SDK. I wish this gap to be closed sooner rather than later, but for now, I can only suggest checking the limitations if you consider Python+Cosmos DB.

--

--

Artem Mikulich
AZ Lamps by Artem Mikulich

I am a solution architect focused on Azure Cloud. My goal is to unlock business potential by eliminating technological barriers.