Get Started With BigQuery For Free In 2022

A hands-on introduction

Vlad Gheorghe
Learning SQL
16 min readMay 18, 2022

--

Picture by https://dlxmedia.hu/ on Unsplash.

This article was last updated on May 2022. I cannot guarantee that it will be current at the time of reading. I invite you to do your own research, especially when it comes to pricing.

Introduction

In this post I explain how to set up BigQuery on your personal account and start using it without spending anything.

I also explain the fundamental concepts that you cannot do without if you want to work with BigQuery. These include:

  • What is a Google Cloud Platform project and how to create one
  • How data is organized in BigQuery
  • How BigQuery’s pricing works
  • How to connect to a public dataset
  • How to exploit BigQuery’s UI
  • Writing, executing and exporting the results of your first query
  • A primer for writing fast, cost-efficient queries

There are no knowledge requirements for this post, although a basic experience with SQL is helpful.

What is BigQuery?

BigQuery is Google’s solution for the modern data warehouse. A data warehouse is a system designed to store and analyze large quantities of data (think terabytes and petabytes).

BigQuery works with SQL, so you don’t need to learn a special language for it. Data is organized in tables, which are collections of rows and columns.

Whether you are a data analyst, data scientist or data engineer, you can leverage BigQuery’s impressive capabilities to achieve your objectives. For example, you can analyse 100 billion rows or 7 terabytes of data in 30 seconds. Not something you can do with a traditional database.

BigQuery is serverless. You don’t have to worry about providing servers, clusters or virtual machines. And you don’t need to perform database administration tasks like defining indexes, sort keys or tuning performance. All this is automatically managed for you. Fire up the service, load some data and start analysing.

However, BigQuery does have a learning curve, especially if you’ve never worked with a data warehouse before.

Disclaimer

BigQuery is essentially a paid tool (more details on pricing in a bit). However, it has both a free tier and a sandbox mode, that will allow you to test it for free.

If you ignore all my advice and try really hard, you could end up wasting money on it. I will explain in detail how the pricing works and how to avoid spending a single cent. I suggest you follow my instructions carefully. But ultimately, you are responsible for how you use the service. I don’t take responsibility for any billing incidents!

With that out of the way, we are ready to get started.

Projects and billing

BigQuery is part of Google Cloud Platform (from now on GCP), a suite of cloud services provided by Google for computing, databases, data workflows and AI. For a high level overview, see this cheat sheet describing every Google Cloud service in four words or less.

The first step in using a GCP service is to create a project. Everything that happens in GCP, happens inside a project. Each project is connected to a billing account, which carries the owner’s payment information. Projects allow GCP to manage pricing and ensure that users pay for services.

Since we want to get started for free, there are multiple ways that we can avoid paying:

  • BigQuery has a sandbox mode which allows monthly free allowance.
  • BigQuery also has a free tier which is essentially equivalent to sandbox mode.
  • When you sign up to GCP for the first time, you should get $300 of free credits, which you can use to try out paid services.
  • At the end of the article, I show how to setup billing controls and alerts to make sure you avoid bad surprises.

Let’s go ahead and create our first project.

Creating a project

You will need a Google account for this. If you don’t have one yet, you can create it here.

Go to the Google Cloud homepage. If you don’t have a GCP account yet, select “Get started for free” in the upper right corner. Go through the steps for creating your account.

The last step requires you create a payments profile. Make sure you select “Individual” under “Account type”. You will need to select a payment method so GCP can verify your identity. At the time of writing this can be either a debit card, a credit card or a PayPal account. No worries: you are not yet liable to get billed at this point.

You may encounter two issues here:

  1. If you don’t have a debit card,credit card, or a PayPal account, you won’t be able to get started. I know this is a serious problem for some users, but I don’t know of any alternative solutions. As long as you have a bank account, you should be able to get a free card or a PayPal account.
  2. GCP may simply refuse your payment method. This has happened to me a few times. At the moment, GCP is not letting me use PayPal. Unfortunately there is no way of knowing why this happens. The only thing that worked for me was to use another payment method. I suggest you try different methods until you find one that works.

Once you‘re signed up, you will be able to access the GCP dashboard at this link: https://console.cloud.google.com/home/dashboard (make sure you are logged in with the right Google account).

Now it’s time to create a project. The dashboard should prompt you to create one, otherwise you can do it at this link. Choose a name for your project and click on “create”.

Accessing BigQuery

You should now see your project’s dashboard. Your project’s name is displayed in the blue navigation bar at the top. You can access BigQuery by writing in the search bar at the top or opening the navigation menu in the upper left corner and scrolling down until you find it:

I recommend clicking on the pin icon on the right to access it more easily next time.

You should now see the BigQuery UI. See the “Sandbox” banner on top? This assures us that we are using BigQuery for free.

How BigQuery organizes data

Data is organized as follows:

  • Every GCP project has a number of datasets.
  • A dataset is a collection of tables.
  • A table is a collection of rows and columns, where the data lives.

Concretely, a BigQuery table looks like an SQL table, a Pandas DataFrame, or a spreadsheet.

In BigQuery, every table has a unique address in the form <project>.<dataset>.<table>. For example my-project.orders.table1.

You may occasionally see the old notation my-project:orders.table1 (colon after the project) but know that this is deprecated.

Once you open BigQuery, you can see any data available to you in the Explorer tab on the left. This is currently empty because we did not load any data.

To see how data loading works, click on the three dots next to your project and select “Create dataset”.

Insert a Dataset ID. This should be something informative like order_data. Be sure to click on “Data location” and select a location that is close to you. This is where all the data within the dataset will be stored.

Once you create your dataset, you will see it displayed in the Explorer tab. You can now add your first table.

In the dialog that follows you can choose a name for your table, among other options. In the Source section, you will see there are several options to load your data into this table. However, we will not explore those options here. Instead, we will leverage Google’s public datasets so we can start querying immediately.

Connect to BigQuery public datasets

Bigquery’s public datasets are a product of Google’s initiative to store public data and make it generally available. Let’s see how we can access them.

In the search bar on the Explorer tab, write “public” and then click on “Broaden search to all projects”. Note: you must literally write just “public” and nothing else. Other queries don’t work for me!

You should now see the bigquery-public-data project. Click on the pin icon to save it in Explorer.

Refresh the UI webpage. Now Explorer shows two projects: your personal project and bigquery-public-data. If you expand the latter, you will see that it contains a number of different datasets. Click on a dataset and you will see that it contains one or more tables.

This interaction between different projects may be confusing, so let’s clarify what’s happening. First of all, keep in mind that we are still working from our personal project. Everything we do happens in there.

But Google has their own project on GCP, which is called bigquery-public-data. Since the data in this project is public, we can access it from our personal project.

It’s good to understand this because if you work with BigQuery, it will be fairly common to access data from different projects at the same time. Of course, you can only access data from public projects or from private projects where you’ve been explicitly authorized.

Here’s how the division of labor works between these two projects:

  • Google’s project stores and manages the data. Since the data is residing there, they also pay for the storage.
  • Your personal project queries the data. Since you are reading their data, you pay for the query costs.
  • If you wanted to copy their data, you would need to store the copy in your personal project, and you would then pay storage costs for that.

In short: the project where the data resides pays for storage, the project that queries the data pays for querying.

Before we start querying, let’s get some insight into how BigQuery pricing works.

BigQuery’s pricing model

BigQuery’s pricing is simple. You pay for two things:

  • The data that is scanned for your queries. This is also known as analysis pricing.
  • The data that is stored in your project. This is known as storage pricing.

Analysis pricing is usually the main cost driver. Let’s explore it in detail.

Whenever you send a query, BigQuery scans a certain quantity of data to give you the result. The data scanned is measured in bytes and billed accordingly. At the time of writing, you pay approximately $5 for every terabyte (TB) scanned. In addition, the first terabyte for every month is free, thanks to the free tier pricing. There is a minimum charge of 10 MB. You are only charged if your query is successful: if it fails, there is no charge.

Let’s take a simple example. You have a table, orders, which contains exactly 1 TB of data. You write the following query:

SELECT * FROM orders

This query asks BigQuery to return all the data in orders. To do this, BigQuery has to scan the whole table. How much data will be scanned by this query? 1 TB. And how much will this query cost you? Currently, about $5. But thanks to free tier pricing, you get to do this once a month for free.

Now I must address a common pitfall. Analysis pricing targets the total bytes scanned, not the bytes returned. This is an important distinction. Imagine you write this query:

SELECT * FROM orders LIMIT 1

This query returns a single row. You might think that it would cost much less. Actually, it costs as much as the previous one. Why? Because BigQuery still scans the whole table, even though it only returns one row. Since the pricing depends on bytes scanned, not on bytes returned, the price is the same.

It helps to have an understanding of how queries run in the backend to avoid falling for such pitfalls. Fortunately, BigQuery always gives you an estimate of a query’s cost before you run it, so if you’re careful you won’t have any bad surprises.

If you think the price is too high, don’t worry. There are many ways to keep costs under control:

  • Before you run a query, BigQuery tells you how many bytes it expects to scan. You can thus get an estimate of the price and decide if it’s worth it.
  • You can set spending limits on queries so that a query that exceeds your budget will simply not run.
  • The documentation provides several BigQuery best practices that allow you to query efficiently and control costs.

We explore all these tricks in our example below.

The second cost factor is storage. At the time of writing, the monthly expense for storing 1TB of data is about $20. You can store 10 GB for free.

To reiterate: BigQuery’s free tier gives you a free monthly allowance for storing 10 GB and scanning 1 TB. The sandbox is basically the same as the free tier, except that you can use it even without making a billing account.

Most other operations in BigQuery, such as creating tables and importing/exporting data, are free.

You can find more details on pricing in the official documentation.

Exploiting the BigQuery UI

There is much we can learn about our data before we even start querying (and spending).

In the Explorer tab, click on the bigquery-public-data project. Scroll down until you find the google-trends dataset (or use the search bar). Once you’ve opened the dataset, click on the table international_top_rising_terms. You should see the table panel in the Editor.

This panel is your best friend when working on BigQuery. It has three tabs: schema, details and preview.

Schema. Shows the columns in your table. Each column has a name and a data type. This is equivalent to an SQL schema.

Details. Here you find useful information such as the table address, its size, location, number of rows and when it was last updated.

Preview. Shows you a few rows from the table.

The preview is crucial because it allows you to get a sense of the data without running any queries.

If you’ve worked with SQL databases before, you might be tempted to start your analysis by running a SELECT * LIMIT 10 type of statement to return the first few rows. There’s no point in doing that in BigQuery, as you would incur unnecessary costs. Just use the preview tab.

Our first query

We are ready to run our first query. But first, a warning:

Never run a query before you’ve made sure that you’re okay with the costs. Once you hit RUN, you will incur the query’s costs.

The BigQuery UI helps us here. Whenever we write a query, in the upper right corner we can see how many bytes it’s expected to scan.

In this case my query will process 47 MB. Since 1 TB = 1,000,000 MB, this represents less than 0.01% of the free monthly allowance. I can run it without any worries.

Here’s another trick. Once you write a query, click on More in the upper panel and then Query Settings. Scroll to the bottom, select Advanced options and you will see an option for maximum bytes billed. Here you can define the maximum amount of bytes that you are willing to be charged for. If your query scans more than that, it will stop running and you won’t be charged.

Time to explore our data. The Google Trends dataset shows each day’s 25 top Google search terms, both in U.S. and abroad, broken down to small geographical areas (regions or cities).

We are interested in finding the top 25 search terms in the United Kingdom.

The table we want is bigquery-public-data.google_trends.international_top_terms .

One of BigQuery’s quirks is that the table address needs to be surrounded by backticks when writing a query.

We can now write our query:

Here’s a breakdown:

  • FROM. We access the table by defining project, dataset and table name, and enclosing the address in backticks.
  • WHERE. We select records where refresh_date is yesterday. We also filter for records in the United Kingdom.
  • GROUP BY. We aggregate by term and rank.
  • AVG(score). We extract the term’s average score in each U.K. region. The score indicates the term’s popularity relative to other terms in that region. Averaged across all regions, the score indicates how widespread that term is (is it popular in just one region, or in all of them?).
  • ORDER BY. We order by rank ascending, so the most important terms come on top.

As I am writing this, the query above is estimated to scan 277 MB of data. Yet… If I select the table in the UI and go to the Details panel, I see that the table contains approximately 15 GB of data. My query is scanning about 2% of all the table data, which makes it quite cheap. How come is it so efficient?

I am actually using some BigQuery magic to minimize costs. More on this in a minute.

We can now run the query. Here’s my results:

You can click on “Save results” to export this data in a variety of formats. In fact, you should do it. There’s no reason to run the same query more than once!

Before we wrap up, let’s examine how we can keep our queries fast and cheap.

Writing efficient queries

Keep this definition in mind:

An efficient query is one that scans the least amount of data, in the least amount of time, while still achieving its goals.

Remember, the number of bytes scanned by a query determines its cost. Therefore we are always incentivized to keep our queries efficient. Another nice effect of reducing the bytes scanned is that our query will run faster.

In the query above, I managed to scan only 2% of the total table data. I used two tricks to do this:

  • I selected only the columns that I needed.
  • I used a partition filter.

Let’s see how you can apply these tricks to your own queries.

Select only the columns you need. BigQuery uses columnar storage. It divides a table by columns, rather than rows, for purposes of storage and querying. Therefore a great way to save costs is to select only the columns that you need. Avoid SELECT *, which returns all the columns. You can test this in the editor by seeing how the query cost changes when adding or removing columns in the SELECT statement.

Apply partition filters. Partitioning is a way to divide a table into distinct segments. When done right, this allows for more efficient querying. If you click the google_trends.international_top_terms table and select Details in the table panel, you will see that the table is partitioned on the field refresh_date. Furthermore, it is partitioned by DAY.

What does this mean? Through partitioning, the data for each distinct day, as indicated by refresh_date, is stored in a separate segment. As a consequence, when you filter by refresh_date in your queries, you will only scan the data that fits in that timespan (as opposed to scanning the entire table).

To understand this, compare the following queries, but DON’T RUN THEM. You only need to paste them in the editor and check the expected cost!

You can see that if you filter by refresh_date, each additional day adds 400-500 MB to your query costs. If you don’t use the partioning filter, you are scanning the entire table.

When working on a table, it’s always a good idea to check if it’s partitioned so you can exploit this optimization. You can see if and how a table is partitioned in the table panel under the Details tab.

Trick: if you click on the query icon in the table panel, BigQuery will open a tab with a pre-written query that leverages the partitioned field.

For more details on partitioning check the documentation.

Google has more extensive documentation on how to improve query performance.

Conclusion

Well done! You’ve setup BigQuery in your personal project, leveraged Google’s public datasets, wrote your first query and learned the fundamentals of BigQuery in the process.

I hope that you feel empowered to start working and experimenting with this awesome tool.

If you have any questions, feel free to drop a comment. You can also find me on LinkedIn.

Bonus: Activate Billing

At some point, you might want to get out of sandbox mode, either because you need to overcome its limitations, or because you have exhausted the free allowance. Here’s how you can do it.

From the GCP search bar or Navigation menu, select the Billing service. Then select “Manage billing accounts”. Click the button to create a billing account. Afterwards, you should see an overview of your billing account. On the left panel, click on “Budgets and alerts” and select “Create budget”. Give a name to your budget and click next.

Now you can specify a maximum spending target for each month. In my case I chose 5 euros.

Note: this will not establish a hard limit on your spending. It is purely an alerting device, so you will know if you are exceeding your spending target and can take appropriate measures.

In the final step, you can decide which thresholds will trigger the alert. By default, you’ll get an email when you hit 50%, 90% and 100% of your total budget.

Now that your billing account is ready, you need to link it to your personal project. Go back to BigQuery and click on Upgrade in the upper right corner. Select your billing account and you’re done.

You are now out of sandbox mode and your account is billable for BigQuery. But you still have the free tier privileges from the sandbox: you can store 10 GB of data and query 1 TB monthly for free (as of the time of writing).

If you still end up spending money on top of the free tier, you can still use your your $300 of free signup credits.

Thank you for reading. Want to receive my next stories? Follow me on LinkedIn.

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Vlad Gheorghe
Learning SQL

I like to learn difficult things and explain them simply.