Hidden gems of BigQuery — P7 — Safety first

Artem Nikulchenko
Google Cloud - Community
8 min readJan 5, 2024

This is a series of posts about amazing BigQuery features. Since there are so many of them, some of them can get lost. So, I’ve decided to cover my favorite:

And this post is about:

  • BigQuery version of TRY…CATCH
  • RAISE, ERROR
  • SAFE_ operations
  • SAFE. functions
  • SAFE_CAST

What is that, and how is it related to BigQuery? It is Vision model results for the request “Exception handling in BigQuery”. So, Vision thinks it is related :) And it looks cool.

Intro

BigQuery is a VERY complicated system that relies on multiple distributed resources for both storage and computing (which also means — a lot of network). As a result, there are many things that can go wrong at any moment.

Likely, thanks to the brilliant BigQuery Team, none of us needs to worry about that :) (unless you are actually part of BigQuery SRE Team…). As a users of BigQuery, none of those issues would be visible to us.

So, the only exceptions that we need to worry about — are exceptions generated by our code. And that is what this post is about…

Exception handling

As a Developer, you are probably familiar with standard Exception processing construction like TRY…CATCH.

BigQuery likely has something very similar to that!

If you want to catch (and hopefully process) an exception generated by some of your query statements, you can use BEGIN…EXCEPTION block.

BEGIN
sql_statement_list
EXCEPTION WHEN ERROR THEN
sql_statement_list
END;

It actually works very similar to TRY…CATCH blocks that you may be familiar with. If any exception happens between BEGIN and EXCEPTION statements — execution is interrupted, and code after EXCEPTION block is executed.

Here is a simple example:

BEGIN
SELECT 1/0;
EXCEPTION WHEN ERROR THEN
SELECT 1;
END

Do we know anything about the exception at this point? Yes!

Error variables

We have four variables that allow us to learn more about what has happened:

@@error.message — Specifies a human-readable error message.

Note: BigQuery documentation emphasizes that you should NEVER rely on this variable in your code for anything except been read by human. There is no documented Errors list, this messages can change at any time, including the format of those messages (so, you should not try to parse out some data from it).

@@error.stack_trace — Stack trace of the error

@@error.formatted_stack_trace — The content of @@error.stack_trace expressed as a human readable string.

Note: As with @@error.message, should only be used for information purposes.

@@error.statement_text — text of the statement where error happened.

Let’s give it a try!

BEGIN
SELECT 1/0;
EXCEPTION WHEN ERROR THEN
SELECT
@@error.message,
@@error.stack_trace,
@@error.statement_text,
@@error.formatted_stack_trace;
END

Result:

@@error.message — Query error: division by zero: 1 / 0 at [2:3]

@@error.statement_text — SELECT 1/0

This is reasonably expected. However, the stack trace is a bit more interesting (and complicated).

@@error.stack_trace returns an ARRAY where each element contains 4 data points:

Line — the line where the problem happened. In our case, it is 1

Column — the column where the problem happened.

Note: Both Line and Column point to the start of the STATEMENT where the error occurred and not to the particular place where it happened.

For example, in this case:

    SELECT * FROM 
(SELECT 12, 1/0) as t;

Line and Column would point to the start of top SELECT statement.

Filename —Reserved for future use. Always NULL currently.

Location — that would contain the name of a function or SP in which the error happened. To demonstrate it, we would need a more complicated example.

CREATE OR REPLACE PROCEDURE schema1.proc1() BEGIN
SELECT 1/0;
END;

CREATE OR REPLACE PROCEDURE schema1.proc2() BEGIN
CALL schema1.proc1();
END;

BEGIN
CALL schema1.proc2();
EXCEPTION WHEN ERROR THEN
SELECT
@@error.message,
@@error.stack_trace,
@@error.statement_text,
@@error.formatted_stack_trace;
END;

In this case, our @@error.stack_trace would look like this:

[{
"line": "2",
"column": "3",
"filename": null,
"location": "<project-name>.schema1.proc1"
}, {
"line": "2",
"column": "3",
"filename": null,
"location": "<project-name>.schema1.proc2"
}, {
"line": "14",
"column": "3",
"filename": null,
"location": null
}]

Interesting questions

Is there an Error Code?

No. There is no Error Code, and there is no way to organize your code to process different errors differently. According to the documentation, you can’t use an error message, and there is nothing else that indicates what actually happened.

Can I catch specific errors?

While BEGIN…EXCEPTION WHEN ERROR THEN block actually contains WHEN…THEN parts (which can make you feel like you can actually have some control there) that are basically a “static” syntax. There is nothing else that can be between WHEN and THEN except ERROR.

That is probably related to the lack of Error Codes…

But, of course, inside the EXCEPTION block, you can add IF (or CASE) statement to check for… and here is the tricky part. The only variable that you can probably use to distinguish one error from another is @@error.message, but at the same time, the documentation states that “As BigQuery reserves the right to revise error messages at any time, consumers of @@error.message should not rely on error messages remaining the same or following any particular pattern.”.

Bottom line, If you want to have some logic executed in case of some specific error:

  • Add IF or CASE inside your EXCEPTION block,
  • Check for @@error.message value,
  • Do it at your own risk since this code may stop working correctly at any point. (Which means that you can do that for debug purposes, but probably not for production workflows…)

Ok, enough with processing errors! Can I create some instead? You actually can!

RAISE

You can use RAISE statement to generate an exception.

This statement has two syntax options (with actually rather different usage).

The more generic one is the syntax with an error message provided like this:

RAISE USING MESSAGE = 'Ops, my bad';

Note: Again, no Error Codes… I would REALLY love to see BigQuery doing Error Codes as everybody else…

RAISE with the provided error message can be used anywhere in the code where you want to RAISE an error.

But there is also RAISE statement without an message. This one can ONLY by used inside EXCEPTION part of BEGIN…EXCEPTION block to re-raise the same error (that was supposedly not handled in EXCEPTION block).

Example:

BEGIN
RAISE USING MESSAGE = 'Ops, my bad';

EXCEPTION WHEN ERROR THEN
SELECT
@@error.message,
@@error.stack_trace,
@@error.statement_text,
@@error.formatted_stack_trace;

RAISE;
END

Note: Yes, I know, I’ve already said that… But I want to say it again! I’m used to the code where TRY…CATCH catches exceptions, processes some that it can, and “throws” it up-stack those it can’t handle. Such a scenario makes total sense to me. But it requires errors to have Error Codes! Otherwise, how would I know how to process which errors (and which not to process at all). But BigQuery do not have Error Codes…

But it is actually not the only option…

ERROR

BigQuery SQL also supports a special debugging function: ERROR.

Since ERROR is a function (opposite to RAISE, which is a statement), you need to use it as part of some statement, for example:

SELECT ERROR('Ops, my bad 2');

Or, more complicated versions like this:

SELECT *
FROM (SELECT -1 AS x)
WHERE IF(x > 0, true, ERROR(FORMAT('Error: x must be positive but is %t', x)))

So, it is up to you to decide when you want to use RAISE USING MESSAGE or ERROR.

TRANSACTION

And what about transactions? Can I combine exception handling and transactions? Of course!

BEGIN 

BEGIN TRANSACTION;
RAISE USING MESSAGE = 'Ops, my bad';
COMMIT TRANSACTION;

EXCEPTION WHEN ERROR THEN
SELECT
@@error.message,
@@error.stack_trace,
@@error.statement_text,
@@error.formatted_stack_trace;

ROLLBACK TRANSACTION;

END

Ok, so now we are clear on TRY…CATCH version of BigQuery. Are there any other options? YES!

Safe operations and functions

Let’s look into a simple example. Imagine we have a table that has two columns:

  • Product Code
  • Sales for this month
  • Sales for the previous month

Each row represents data about one specific product.

We may want to learn how much sales have increased (represented in percentages). Our formula would be:

(<Sales of this month> — <Sales of previous month>) / <Sales of previous month> * 100

While this code is simple to write, we immediately can see a problem: what if the product is new and sales during the previous month are zero? We clearly would get an exception, and that is not good.

Can BEGIN…EXCEPTION block that we have discussed before help us? Well, not really. It is designed to catch errors in multi-statement operations and not for exceptions within one statement, especially those that are expected by the data.

This code can help solve the following:

SELECT CASE WHEN B != 0 THEN A/B ELSE 0 END FROM ....

Note: Instead of A/B you can use DIV(A,B) function.

But it immediately makes code much less readable. Are there any other options? YES!

SAFE_DIVIDE (and other…)

There is a special function for that case — SAFE_DIVIDE. It works as a normal DIV (divide function), but in case of division by zero — it returns NULL instead of the exception.

SELECT SAFE_DIVIDE(A, B)  FROM ...

Great! Are there other SAFE_ functions?

Yes! There are SAFE_ADD, SAFE_MULTIPLY, SAFE_NEGATE and SAFE_SUBSTRUCT.

Note: You may be thinking now “What can go wrong during ADD or MULTIPLY?”. Overflow…

SAFE. functions

And what if I’m concerned about exceptions that happen in non-numerical operations? For example, this code also would return an error:

SELECT SUBSTR('foo', 0, -1) 

However, there is SAFE version of this function:

SELECT SAFE.SUBSTR('foo', 0, -1) 

This statement would return NULL instead of Error.

What else can be “safe”? BigQuery supports the use of the SAFE. prefix with most scalar functions that can raise errors, including STRING functions, math functions, DATE functions, DATETIME functions, TIMESTAMP functions, and JSON functions. BigQuery does not support the use of the SAFE. prefix with aggregate, window, or user-defined functions.

SAFE_CAST

And what another common type of error — errors during type casting.

For example, that would definitely return an error:

SELECT CAST("apple" AS INT64) AS not_a_number;

However, this would just return NULL:

SELECT SAFE_CAST("apple" AS INT64) AS not_a_number;

That can be very handy when working with “dirty” data!

Other SAFE functions for specific use-cases:

Why do some functions have SAFE_ and some SAFE. ?

It does not say so in the documentation, but I was told that:

SAFE. — represents functions where INPUT is checked (and NULL returned in case of Error)

SAFE_ — represents cases when the output is also checked (like division by zero).

IEEE_DIVIDE

SAFE_DIVIDE is actually not the only function that won’t return an exception when diving by zero. There is also IEEE_DIVIDE.

SELECT IEEE_DIVIDE(1, 0)

What is the difference? IEEE_DIVIDE would actually return Infinity :) And it can even divide 0 / 0 (and return NaN…).

So, it is not really related to our topic, but I thought it was worth mentioning.

Summary

We have looked into multiple options to handle (or prevent) exceptions during the execution of BigQuery statements, including:

  • BEGIN…EXCEPTION
  • RAISE
  • SAFE_
  • SAFE.
  • IEEE_DIVIDE

Hope you found something new and useful!

--

--

Artem Nikulchenko
Google Cloud - Community

Chief Software Architect with 10+ year of experience, PhD, Associate Professor, IT Univer co-organizer, GDG Organizer