Good read! good to know what is Google’s solution for fast analytics.
Few things though —
- 4 TB is not much for today, my 7-day serverless project had created 1TB/day of telemetry.
- $5 / TB / query ? that sounds a lot.
On day to day basis, my investigations of bugs or monitoring would cost me hundreds of dollars per day.
- Writing the right query is an iterative-process, it takes several queries to fine tune it and get what you were looking for — for one question that I ask myself, and would like to answer it from the data that I have — I might pay $50–80 — that’s way too much.
I’d like to reveal Microsoft’s solution,
First — the language, it is called KQL, and is amazingly written like this:
| where timestamp > ago(7d)
| summarize count(), dcount(userId), percentile(duration, 95) by bin(timestamp, 1h)
| render timechart
- Get all data from the requests table, of the past week
- Aggregate the following: count of requests, distinct count of users, percentile-95 duration by 1-hour buckets
- Render a timechart
(that’s the simplest query I could write)
Second, the volumes (4TB is really not much for today) — the engine running this query language had served a customer whose ingestion rate was tens of terabytes / day.
Per today’s pricing — you don’t pay for the query, you pay only for ingestion.