In a previous article, we reviewed how to set up logs and troubleshoot Cosmos DB issues using Azure Log Analytics.
Azure Cosmos DB. The power of Logs. Troubleshooting.
In this, another part of the series, we already know how to set up an account, what’s the difference between the…
You can read the full story by navigating to the link above. In a nutshell, I pushed 31 million Invoices into Cosmos DB with a total size reaching 47GBs. The database had containers with different indexing and logical partition key configuration. The test's purpose wasn’t related to costs analysis but an unexpected spike in charges for my subscription resulted in this article.
So here we are now.
For reference, a code-snippet that I used for data ingestion (utilizing parallelism of Cosmos DB .Net SDK):
And the Invoice class itself:
Identifying the source
A low throughput for an account helped with keeping database charges in a reasonable range. The cost for Cosmos DB data ingestion was 35$. At the same time, Azure Log Analytics — surprising 50$.
That gives a good reason to investigate why costs were so high and how to reduce them without compromising the depth of logging for the system.
First, we need a correlation between Logs stored per gigabyte of Cosmos DB data. Kusto query language is a perfect tool to get this information.
Combining Usage source for Azure Logs [Quantity] and AzureMetrics [Data Usage] for Cosmos DB gives the required result:
After plotting results to a pie chart, it becomes clear that we have a log overdose:
Every gigabyte of data inserted into the Cosmos DB system was generating roughly 300 megabytes of logs. Excessive for a simple invoicing system.
After checking official pricing from Microsoft:
With Pay-As-You-Go pricing, you are billed per gigabyte (GB) of data ingested into the Log Analytics workspace.
The cost per GB of ingested data is $2.76. There is also a free cap of data: 5 GB per billing account per month — a nice addition.
With the current logging setup, an insert of a new invoice in the system costs an extra $0.0016 per document.
The last thing to check before moving to optimization of the costs — prove that cost overhead is related to document insertion and not to its size.
Reduction of data points per invoice proved that impact is even higher. The ratio between document count and logs reached 64% data to 36% logs. If we map the result to actual test data ingested into the database — 1 gigabyte of logs per 1M documents.
Optimizing the costs
We identified that excessive logs contributing to a significant part of our costs for Cosmos DB data ingestion.
The obvious solution — reduce log size. For that, let’s go back to the diagnostics configuration of Cosmos DB. As described in the previous article we enabled all logs without getting into specifics on each category and metric.
Let’s review the purpose of each category:
- DataPlaneRequests logs requests to SQL API account in Azure Cosmos DB. Key properties to note are:
- MongoRequests, CassandraRequests, GremlinRequests, TableApiRequests — requests related to the corresponding database endpoints.
- QueryRuntimeStatistics — will log SQL API query statistics.
- PartitionKeyStatistics, PartitionKeyRUConsumption— logs data and throughput statistics for partition keys in the container.
- ControlPlaneRequests — changes to the account and databases. Useful for audit purposes.
- Requests — Cosmos DB metric data which is also automatically collected in Azure Metrics.
We are not sending Requests anywhere outside of the Analytics workspace and we can trade Azure Metrics for cost improvements. Also, there is no use of statistics for any endpoints except for SQL API.
Update to the log streaming configuration and re-run of the ingestion didn’t show much improvement. Unfortunately turning off these logs won’t resolve the issue as they have a minimal impact on the log size.
We still have to remove logs that have a significant impact on the size. After a series of tests with 1M documents (total size close to 1.5 gigabytes), it was clear that Logs have the following split:
- DataPlaneRequests — 75% of data logs
- PartitionKeyStatistics, PartitionKeyRUConsumption and metrics (Requests)— 25%
Since we are not querying data in the current scenario QueryRuntimeStatistics is irrelevant and won’t have an impact on the result.
After excluding DataPlaneRequests from the log stream costs dropped. Query to confirm the effect of the change:
Resulting in the following chart (I couldn’t resist the temptation to improve it):
Bingo! Simple switch — huge savings.
For a write-heavy scenario, Cosmos DB logs through Azure Log Analytics can generate an extra $0.0016 per document.
Excluding DataPlaneRequest from the log stream can reduce up to 75% of costs.
Feel free to explore my articles on real-life Cosmos DB troubleshooting and tuning:
Thank you for reading! Share, comment, and have fun exploring exciting technologies.