What to do about BigQuery error “Resources exceeded during query execution”
Here at MoneySupermarket we use Google Cloud BigQuery to power our data analytics and I think its fair to say we push it to its limits. Sometimes we exceed those limits, and that’s what this blog post is about.
Recently we started encountering this error:
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
when running queries and as a consequence I’ve learnt a lot about what it means and what you can do about it.
Allow me to explain
The error message can be taken literally, a query has been issued that is simply too complex for BigQuery to cope with. Some people have assumed that this can be caused by the data volumes that the query is going to process but actually that probably isn’t the case, the complexity of the query is calculated before data volumes are considered, its absolutely possible for this error to occur even if the query would process 0 bytes. (I used the word “probably” in that previous sentence and deliberately so. It may be possible for this error to occur in other circumstances, I do not know with 100% certainty if that is true or not. This blog post describes the situation that I am familiar with.)
When a query is submitted to BigQuery the service first calculates a complexity score for that query which is a function of number of columns being processed, number of subqueries/CTEs, inline column expressions, UNIONs, and various other things (our queries at MoneySupermarket exhibit many of these things, particularly UNIONs). The chance of the error occurring increases if you have nested views (i.e. views that SELECT from other views which in turn SELECT from other views etc…). If that complexity score exceeds an internal quota then the error occurs.
Through my interactions with Google Support I have discovered that that internal quota is 50 million and the quota is set per-project. That number, 50 million, is fairly meaningless to the end user but it can be very useful to compare to the actual query complexity score calculated for your query. Unfortunately, and here’s the rub, the only way to discover the complexity score for a query is to get in touch with Google Support and give them the JobID of your query. Moreover, at the time of writing Google Support only know the query complexity score for a query if the quota is exceeded.
This lack of transparency is a problem. I have opined to Google that both the internal quota and the complexity score of every query should be made available to us, the end users, but I don’t harbour high hopes that this will ever happen.
What can you do about it?
It is possible to have the query complexity quota increased, the way to do that is to get in touch with Google Support and make a compelling case for why it should be increased. It will likely take days rather than hours for the quota increase to be actioned.
You can also rewrite your queries to mitigate the problem, literally make them less complex. This is why its useful to know the quota and the complexity score of your query, if you are only slightly over the quota then you stand a good chance of doing something about it with minimal changes.
If your complexity score massively exceeds the quota and you can’t get the quota increased then you’re going to have to take more drastic action and Google do make recommendations about what to do; those recommendations basically entail dropping data into temporary tables, guidance for which is at https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting.
One trick we have found is that replacing a view object with the definition of that view object can shave off a small fraction from the complexity score. In other words replace this:
SELECT <columns>
FROM <view>
with this:
SELECT <columns>
FROM (
<definition-of-view>
)
We use terraform to define all our BigQuery views by way of google_bigquery_table and templatefile(), doing so makes it quite easy to make such a change by referencing the query attribute of the google_bigquery_table resource deploying the view whose definition you need to use.