Using BigQuery Flex Slots to run machine learning workloads more efficiently

Create Flex Slots, run query, then delete the flex slots

Lak Lakshmanan
May 17, 2020 · 5 min read

There are, broadly, two different pricing plans for BigQuery based on what you value more: predictable cost or efficiency:

  1. Flat-rate pricing is predictable. You buy a certain number of slots and pay the same amount regardless of how many queries you submit. This is great for enterprises that want predictable costs. The drawback is that if you buy 1000 slots, and have a spiky workload that usually uses 500 slots and sometimes needs 1500 slots, slots go unused when you are using only 500 and queries run slower when it would have been nicer to have 1500 slots.
  2. On-demand pricing is efficient both in terms of price and in terms of performance. You pay for the amount of data processed by your queries. This is the original cloud promise — pay for exactly what you use. It is perfect for spiky workloads since you never pay for unused slots, and your queries use all the compute power available at the time they run. However, even though you can set up cost controls, per-day limits on price, etc. to limit your exposure, many enterprises don’t like this because (a) it is hard to budget for something that varies month-to-month, and (b) Cost controls and daily limits add friction.

In general, we find that larger enterprises prefer flat-rate and smaller digital natives prefer on-demand pricing.

Flex slots allow you to get efficient pricing even if you are paying flat-rate. Image by PublicDomainPictures.

Assume that you can are on flat-rate pricing because you want predictable costs, but you have a spiky workload. Moreover, you know exactly when the you need that extra capacity.

Let’s say that you train a recommendations model once a day and that is when you need 1500 slots. The rest of the time, you need only 500 slots. Wouldn’t it be nice to buy a 500-slot flat-rate plan, but buy an extra 1000 slots for just an hour every day that the recommendations model training happens? That way, your regular workload is unaffected and the recommendations model has 1000 slots so it is not running slower.

This is called “Flex Slots”. You can buy flex slots for as short as 60 seconds. It allows you to add flexibility on top of flat-rate pricing.

With on-demand pricing, you will pay for the amount of data processed by your query. If you are just doing some basic SELECT, WHERE, JOIN, and GROUP BY, then both on-demand and flat-rate work out to the same. On-demand pricing is a bargain for computationally expensive queries — if your query does GIS, regular expression parsing, JSON extraction, ORDER BY, etc., paying just for the bytes processed is a bargain. So, in general, on-demand pricing is very cost-effective.

Machine Learning queries are the exception to this. Machine learning model training requires iterating over the data multiple times, and so in on-demand, you will get charged based on the number of iterations the ML model might have to do. This means that, if you know your ML model won’t take the maximum amount of time, you are better off using either a flat rate reservation or flex slots — ML in BigQuery is cheaper if you pay for the compute that you actually use.

In particular, recommendation models are way too expensive when charged by data and iterations — common matrix factorization models have millions of users and tens of thousands of products. Matrix factorization is an O(N³)algorithm. In practice, though, the convergence is faster because your matrix is probably quite sparse. For this reason, BigQuery refuses to do matrix factorization if you are on-demand and tells you to buy flex slots so that you can pay for the compute that you actually use.

You can, of course, use the BigQuery console to create flex slots

But you can automate the flow — buy some flex slots, set up a reservation, run the ML model training query, and then delete the whole shebang. Here’s a script to automate the whole process.

Let’s say that we want to run a query on 500 flex slots in the US:

run_query() {
cat ../bqml_recommendations/train.sql | bq query --sync -nouse_legacy_sql

First purchase a slots capacity reservation with flex slots:

bq mk --project_id=${PROJECT}  --location=${LOCATION} \
--capacity_commitment --slots=${SLOTS} --plan=FLEX

Then, create a reservation that uses these slots:

bq mk --reservation --project_id=${PROJECT} --slots=${SLOTS} \
--location=${LOCATION} ${RESERVATION} || cleanup

If creating the reservation fails, make sure to cleanup the slots (more on this later).

Then, allow our project to use this reservation:

bq mk --reservation_assignment \
--reservation_id=${PROJECT}:${LOCATION}.${RESERVATION} \
--job_type=QUERY --assignee_type=PROJECT \

The three steps above are equivalent of the three steps you’d do on the web console, to create slots, reservation, and assignment:

Now, simply run the query:


Once the query is done, clean up the assignment, reservation, and slot commitment (in reverse order of creation)

bq rm --reservation_assignment --project_id=${PROJECT} \
--location=${LOCATION} ${ASSIGNMENT} || true
bq rm --reservation --project_id=${PROJECT} \
--location=${LOCATION} ${RESERVATION} || true
until bq rm --location=${LOCATION} --capacity_commitment ${CAPACITY}
echo "will try after 30 seconds to delete slots ${CAPACITY}"
sleep 30

The script doesn’t exit if it is not able to delete the reservation, assignment, or slots. It will keep running. Change this behavior to alert you instead.

This is not official Google work. As the copyright on the script says, the software is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  1. Read more about Flex Slots.
  2. Check out my script on GitHub that wraps a query around creating and deleting a flex slots reservation. If you have improvements to suggest to the script, please do a pull-request to GitHub.
  3. Read article by my colleague Patrick Dunn where he shows how Flex Slots can reduce the cost of large queries.
  4. Examples of situations where the ML model will run much faster than the size of data indicates: linear regression with < 1000 features, kmeans++, early stopping, and many more optimizations that keep getting added.

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lak Lakshmanan

Written by

Data Analytics & AI @ Google Cloud

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store