Google BigQuery Errors Catalogue: it’s a detour, not a dead-end street — Series #1

Nayana
Mindvalley Technology
5 min readJul 23, 2021
Photo by Brett Jordan on Unsplash

In my previous article, I talked about the origins of my journey as a data engineer as well as some of the limiting beliefs I had to overcome in order to kick start my career as a data engineer.

The article serves as a great introduction into the the topic I am about to tackle today, so I suggest you check it out:
How to Start on the Road to Mastery in SQL and BigQuery

What to expect in this series

We’ll do a deep dive into some of the basics of errors and error handling to familiarize ourselves with the BigQuery platform and its concepts.

All the errors we face are usually pretty self-exploratory. We need to pay attention to them rather than being afraid of them 😄.

Now let’s begin by taking a look at the significant error categories.

Accessing Tables/Views in BigQuery

Today, we’ll cover how to access tables/views in BigQuery and some fundamental errors users face.

To begin with, let’s first understand the hierarchy of the project. To access a table in BigQuery, we have to go two levels deep.
The structure looks like this: [project_id].[dataset].[table_id]

BigQuery Project Hierarchy

After we have understood the structure, let’s check which are the errors we can face that are related to it:

“Project name needs to be separated by dot from dataset name, not by colon in table name.”

The project name needs to be separated by a dot from the dataset name, not by a colon in the table name.

Sometimes, we can copy-paste the table id as it from the Table Details section. This error is relatively self-explanatory. If we observe the above syntax, we understand that table id gets generated separated by a dot (.), not by a colon(:).

To avoid such minor errors, we can either use the query template provided on the BigQuery UI or be conscious while writing and make sure to cross-check the table id before hitting the query execution button called “Run.”

cross-check Table ID

“Not found: Dataset XXX was not found in location YY.”

Not found: Dataset XXX was not found in location YY.

We could face this error for either of these reasons:
1. The dataset name is missing from the Table ID.
2. The dataset name is incorrect.

To make our work easy, BigQuery has provided us with different features like searching through table names, intelligent IntelliSense, getting table information tab, etc.

Query without project ID

In BigQuery, we can find all the tables only under the dataset. Hence, it’s mandatory to include the dataset name. It doesn’t matter if we exclude the project ID unless we execute the queries under the same project.

Also, the quotes around the Table ID are not mandatory as per the current version. See the sample query below. For better readability, you can surround Table ID with quotes.

SELECT Clause Structure

These could be some potential errors that newbies to the platform could face.

“Syntax error: SELECT list must not be empty.”

Syntax error: SELECT list must not be empty

If we are new to SQL, we will see this error. The SELECT clause is our Genie 🧞; we make a wish, and it fulfills🌟. To have our desire fulfilled, we need to give the correct command. Hence, we cross-check our SELECT syntax once. BigQuery’s instant guidance will always be there. As soon as we start typing, it begins analyzing and shows us messages on the top right corner of the UI just like this:

“Syntax error: Expected end of input but got keyword XXXX.”

The order of clauses in the SELECT statement also matters. It’s a straightforward syntax. If we miss the order, we might end up getting confused. The error displayed below is one such example that I intentionally generated to demonstrate this.

Syntax error: Expected end of input but got keyword XXXX

“ORDER BY clause expression references column XXXX which is not visible after SELECT DISTINCT”

Before we go to the solution of this problem, we must understand how the ORDER BY clause works in general.

Now, let’s observe the below two variations in queries.

Variation 1: ORDER BY clause with plain SELECT statement
Variation 2: ORDER BY clause with SELECT DISTINCT statement

As per the Query Execution Sequence, the ORDER BY clause executes after the SELECT query finishes execution. Whenever any clause runs in its sequence, a virtual table generates in the background, and the result of that table gets passed over to the next step for further execution. In computer terminology, everything has an input and respective output. The same concept applies here.

Let’s understand how it will work for SELECT + ORDER BY clause — without DISTINCT expression:

While executing it, the SELECT phase will provide two virtual tables:
1. an original copy as a virtual table
2. the result of the query after the filtration, if provided any.

And now, the ORDER BY clause will have access to both of these tables. Hence, even if the SELECT expression doesn’t contain a field, but the original table has so that we can perform ORDER BY operation (Refer to variation one image above)

The exact opposite thing happens when we include DISTINCT in the expression. The DISTINCT clause will affect the data — we will get distinct records among the specified columns. Hence, the ORDER BY phase will receive the virtual table that gets generated in the previous process. And this virtual table won’t be having that column.

I hope you’ll find this helpful. Keep learning, keep growing and stay tuned until the following article. 🙂 🎉

--

--