How to right-size your flat-rate and flex slots reservations in BigQuery
Translating the slot usage graph into workload management
I am working with a non-profit organization that uses BigQuery for their data analysis. They use on-demand pricing, but have hit the point at which flat rate pricing will start to make sense (typically $10k monthly, although if your usage is more repeatable, it can be as low as $2k monthly). How should they go about making the change? What is a good strategy to follow?
All this information is in the BigQuery documentation, but it is in several different places. So, I’m consolidating it here and linking out to some other great articles on the topic.
When does it make sense?
With on-demand pricing, you get 2000 slots per project. So, you should consider moving from on-demand pricing to flat-rate pricing only if any of these apply:
- Your flat rate reservation will be more than 2000 slots, so there won’t be any slow down.
- You are not concerned by the slowdown in queries, perhaps because these are reporting workloads.
- You have the ability to add flex slots. For example, if your large queries are daily reports that use 5000 slots at peak, you may be able to reserve 500 slots and buy 4500 flex slots for an hour every day. You will run these large daily reporting queries during that hour.
Assume that your average hourly slot usage looks like this:
- Find the baseline. Here (see the Thursday graph, below), we seem to need 500 slots during daytime, going down to zero at night. We could buy zero flat-rate and 500 flex slots for 16 hours. Alternately, we can buy 500 flat-rate slots. Do the math, we will be better off with a flat rate reservation of 500 slots.
2. Before you make a long-term commitment to buy 500 slots for a year or more, try it out by doing 500 monthly flex slots. This way, you are not out of pocket too much if it turns out that the flat rate model doesn’t work for any of your projects.
3. Find the query that causes the peaks. Let’s say that you find that it uses 3000 slots (see Tuesday’s peak, below). Buy 2500 flex slots, run this expensive query, and release the slots. You could also schedule the flex slots purchase.
4. There are sporadic peaks (the bumps everyday, see below). Find out what causes them. Let’s say it turns out that these are queries created by data scientists interactively exploring data. We want to give these data scientists fast access. The 500 slot limit will annoy them. So, create an on-demand project and assign the data scientists to it. But add cost controls to that project to avoid the data scientists mistakenly running a 1 PB query.
5. It turns out that the increased baseline on some days (eg. baseline on Monday, see below, is more than 500) is driven by a public-facing dashboard. There are 1000s of queries, but we are not as concerned about query performance — it’s not a problem if these queries take 3s instead of 1s to execute. Put the dashboard usage in the original 500 flat-rate slot reservation. Increase the reservation to 800 slots and give priority for these 300 slots to the dashboard. When the dashboard is not using these 300 slots, they will help speed up everything else.
Here is great background reading on the various moving parts (pricing, reservations, workload management, slot usage monitoring) that informs the strategy above: