Efficient and Optimized ways to copy tables in BigQuery

Narendrababuoggu
3 min readOct 16, 2023

--

One of the challenges that data engineers face, is copying tables within the datawarehouse they are working on. Especially incase of cloud based datawarehouses where the pricing involves around the amount data stored/processed, this can be a complex and time-consuming task.

Today I will discuss about the different features that BigQuery offers for copying tables and how to use them to efficiently and accurately copy tables within BigQuery.

Challenges

Copying tables within BigQuery can be a challenging task, especially if you need to copy large tables or tables with complex schemas. Some of the challenges that developers face when copying tables within BigQuery include

  • Performance: Copying large tables can be a performance bottleneck.
  • Accuracy: It is important to make sure that the copied table is an accurate copy of the source table.
  • Data Integrity: You need to ensure that the data integrity of the source table is maintained during the copy process.

to overcome these challenges, BigQuery offers a variety of features to help developers copy tables efficiently, including COPY, CLONE, and SNAPSHOT.

COPY

The COPY feature is the simplest way to copy a table. It creates a new table with the same schema and data as the source table. The COPY statement can be used to copy a table within the same dataset or to a different dataset.

As it copies the entire data of the table, the user has to pay for the storage used bythis table also.

To copy a table within the same dataset, use the following syntax:

CREATE OR REPLACE TABLE `my_project.my_dataset.my_new_table`
AS
COPY `my_project.my_dataset.my_source_table`;

CLONE

The CLONE feature creates a lightweight, writable copy of a table. The table clone is initially empty, but it can be written to and updated independently of the base table. Table clones are useful for testing and development purposes.

As clone doesnt exactly copy the data for the newly created table, The storage pricing will be applied only to the changes done on the cloned table after creation.

from Storage Costs

To clone a table, use the following syntax:

CREATE OR REPLACE TABLE `my_project.my_dataset.my_new_table`
CLONE `my_project.my_dataset.my_source_table`;

SNAPSHOT

The SNAPSHOT feature creates a read-only clone of a table at a specific point in time. Table snapshots are useful for data analysis and auditing purposes. We can create snapshot of the current data or even last seven days of the data of a table. This will be best suited if we want to keep a version of the table for more than seven days.

Snapshot tables also follow the same storage pricing as the Clone tables and as these are read only they wont allow any DML operations.

To create a table snapshot, use the following syntax:

CREATE SNAPSHOT TABLE `my_project.my_dataset.my_new_table_snapshot`
FROM `my_project.my_dataset.my_source_table`;

Summary

BigQuery offers a variety of features to help developers copy tables efficiently and accurately. By using these features, developers can overcome the challenges of copying tables in BigQuery and improve the quality and reliability of the data pipelines.

References

--

--