Big Query : Disaster Recovery

Dhruv Ahuja
Google Cloud - Community
3 min readApr 10, 2023
Photo by Martin Sanchez on Unsplash

BigQuery is a regional service, it is the responsibility of BigQuery to automatically handle the loss of a machine or even an entire zone. But in the case of regional failure, for example, if a disaster destroys the entire region, could result in loss of data stored in that region. BigQuery does not automatically provide a backup or replica of your data in another geographic region.

I will be sharing details around how we can create cross-region dataset copies to enhance our disaster recovery strategy.

Types of Failures

  1. Soft failure : Hardwares or underlying infrastructure is not destroyed.
  2. Hard failure : Hardwares gets destroyed. Examples include damage from floods, terrorist attacks, earthquakes, and hurricanes.

A Soft Regional failure, such as a region-wide loss of network connectivity, results in loss of availability until the region is brought back online, but it doesn’t result in lost data. It is a well-tested path and is automatically mitigated within a few minutes.

For Hard Regional failures, BigQuery does not automatically provide a backup or replica of your data in another geographic region. We need to replicate the data to another region for the sake of data security.

Copy Datasets

Dataset copying uses features of the BigQuery Data Transfer Service.

Required permissions

To create a dataset copy, you need the following IAM permissions:

To create the copy transfer, you need the following on the project:

  • bigquery.transfers.update
  • bigquery.jobs.create

On the source dataset, you need the following:

  • bigquery.datasets.get
  • bigquery.tables.list

On the destination dataset, you need the following:

  • bigquery.datasets.get
  • bigquery.datasets.update
  • bigquery.tables.create
  • bigquery.tables.list

Setting up a dataset copy

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

bq mk --transfer_config --project_id=PROJECT_ID --data_source=cross_region_copy --target_dataset=DATASET --display_name=NAME --params='PARAMETERS'

The --data_source must be set to cross_region_copy for dataset copy commands.

  • PROJECT_ID: your Google Cloud project ID. If --project_id isn't specified, the default project is used.
  • DATASET: the BigQuery target dataset for the transfer configuration.
  • NAME: the display name for the copy job or transfer configuration. The transfer name can be any value that lets you easily identify the transfer if you need to modify it later.
  • PARAMETERS: contains the parameters for the created transfer configuration in JSON format.

Parameters for a dataset copy configuration are:

  • source_dataset_id: the ID of the source dataset you want to copy.
  • source_project_id: the ID of the project your source dataset is in.
  • (Optional) overwrite_destination_table: include this flag if you want to truncate the tables of a previous copy and refresh all the data.
'{"source_dataset_id":"mydataset","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'

Limitations

  1. Copying views is not supported.
  2. Copying external tables is not supported.
  3. You must create the destination dataset before you can create a transfer configuration for the dataset copy.
  4. You cannot increase the frequency of the copy job to be less than every 12 hours.
  5. Copying data in the write-optimized storage is not supported.
  6. Copying partitioned tables is currently supported. However, appending data to a partitioned table is not supported.
  7. Datasets with customer-managed keys (CMEK) cannot be used as a destination when copying across regions. However, a table with CMEK is allowed as a destination when copying within regions.
  8. If a table exists in the source dataset and the destination dataset, and the source table has not changed since the last successful copy, it is skipped. This is true even if the Overwrite destination tables box is checked.
  9. While data transfer is in progress, the changes to the source tables may not be reflected in the destination tables if the table copy jobs have already started.

Quotas and Limits

References

--

--