Using BigQuery Flex Slots to run machine learning workloads more efficiently

Create Flex Slots, run query, then delete the flex slots

Lak Lakshmanan
Google Cloud - Community
5 min readMay 17, 2020


There are, broadly, two different pricing plans for BigQuery based on what you value more: predictable cost or efficiency:

  1. Flat-rate pricing is predictable. You buy a certain number of slots and pay the same amount regardless of how many queries you submit. This is great for enterprises that want predictable costs. The drawback is that if you buy 1000 slots, and have a spiky workload that usually uses 500 slots and sometimes needs 1500 slots, slots go unused when you are using only 500 and queries run slower when it would have been nicer to have 1500 slots.
  2. On-demand pricing is efficient both in terms of price and in terms of performance. You pay for the amount of data processed by your queries. This is the original cloud promise — pay for exactly what you use. It is perfect for spiky workloads since you never pay for unused slots, and your queries use all the compute power available at the time they run. However, even though you can set up cost controls, per-day limits on price, etc. to limit your exposure, many enterprises don’t like this because (a) it is hard to budget for something that varies month-to-month, and (b) Cost controls and daily limits add friction.

In general, we find that larger enterprises prefer flat-rate and smaller digital natives prefer on-demand pricing.

Flex slots allow you to get efficient pricing even if you are paying flat-rate. Image by PublicDomainPictures.


Assume that you can are on flat-rate pricing because you want predictable costs, but you have a spiky workload. Moreover, you know exactly when the you need that extra capacity.

Let’s say that you train a recommendations model once a day and that is when you need 1500 slots. The rest of the time, you need only 500 slots. Wouldn’t it be nice to buy a 500-slot flat-rate plan, but buy an extra 1000 slots for just an hour every day that the recommendations model training happens? That way, your regular workload is unaffected and the recommendations model has 1000 slots so it is not running slower.

This is called “Flex Slots”. You can buy flex slots for as short as 60 seconds. It allows you to add flexibility on top of flat-rate pricing.

When to use Flex Slots even with on-demand pricing

With on-demand pricing, you will pay for the amount of data processed by your query. If you are just doing some basic SELECT, WHERE, JOIN, and GROUP BY, then both on-demand and flat-rate work out to the same. On-demand pricing is a bargain for computationally expensive queries — if your query does GIS, regular expression parsing, JSON extraction, ORDER BY, etc., paying just for the bytes processed is a bargain. So, in general, on-demand pricing is very cost-effective.

Machine Learning queries are the exception to this. Machine learning model training requires iterating over the data multiple times, and so in on-demand, you will get charged based on the number of iterations the ML model might have to do. This means that, if you know your ML model won’t take the maximum amount of time, you are better off using either a flat rate reservation or flex slots — ML in BigQuery is cheaper if you pay for the compute that you actually use.

In particular, recommendation models are way too expensive when charged by data and iterations — common matrix factorization models have millions of users and tens of thousands of products. Matrix factorization is an O(N³)algorithm. In practice, though, the convergence is faster because your matrix is probably quite sparse. For this reason, BigQuery refuses to do matrix factorization if you are on-demand and tells you to buy flex slots so that you can pay for the compute that you actually use.

Running a query in Flex Slots

You can, of course, use the BigQuery console to create flex slots

But you can automate the flow — buy some flex slots, set up a reservation, run the ML model training query, and then delete the whole shebang. Here’s a script to automate the whole process.

Let’s say that we want to run a query on 500 flex slots in the US:

run_query() {
cat ../bqml_recommendations/train.sql | bq query --sync -nouse_legacy_sql

First purchase a slots capacity reservation with flex slots:

bq mk --project_id=${PROJECT}  --location=${LOCATION} \
--capacity_commitment --slots=${SLOTS} --plan=FLEX

Then, create a reservation that uses these slots:

bq mk --reservation --project_id=${PROJECT} --slots=${SLOTS} \
--location=${LOCATION} ${RESERVATION} || cleanup

If creating the reservation fails, make sure to cleanup the slots (more on this later).

Then, allow our project to use this reservation:

bq mk --reservation_assignment \
--reservation_id=${PROJECT}:${LOCATION}.${RESERVATION} \
--job_type=QUERY --assignee_type=PROJECT \

The three steps above are equivalent of the three steps you’d do on the web console, to create slots, reservation, and assignment:

Now, simply run the query:


Once the query is done, clean up the assignment, reservation, and slot commitment (in reverse order of creation)

bq rm --reservation_assignment --project_id=${PROJECT} \
--location=${LOCATION} ${ASSIGNMENT} || true
bq rm --reservation --project_id=${PROJECT} \
--location=${LOCATION} ${RESERVATION} || true
until bq rm --location=${LOCATION} --capacity_commitment ${CAPACITY}
echo "will try after 30 seconds to delete slots ${CAPACITY}"
sleep 30

The script doesn’t exit if it is not able to delete the reservation, assignment, or slots. It will keep running. Change this behavior to alert you instead.

This is not official Google work. As the copyright on the script says, the software is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Next steps

  1. Read more about Flex Slots.
  2. Check out my script on GitHub that wraps a query around creating and deleting a flex slots reservation. If you have improvements to suggest to the script, please do a pull-request to GitHub.
  3. Read article by my colleague Patrick Dunn where he shows how Flex Slots can reduce the cost of large queries.
  4. Examples of situations where the ML model will run much faster than the size of data indicates: linear regression with < 1000 features, kmeans++, early stopping, and many more optimizations that keep getting added.



Lak Lakshmanan
Google Cloud - Community

articles are personal observations and not investment advice.