Importing existing BigQuery Resources to Terraform Automatically

Barış Hasdemir
Trendyol Tech
Published in
4 min readSep 23, 2021

How can we automate the resource importing process?

Hi there,

After a long break, I am here with a new post with you. Nowadays, increasing the popularity of the cloud is a common fact. Most companies already completed their progression with the cloud and most of them are planning their transition processes. To manage the cloud workflows people tend to use IAC (Infrastructure As Code) solutions such as Terraform. Today we are going to focus on how can we start using Terraform with our existing BigQuery resources. As this blog is posted, importing datasets and tables is a cumbersome and manual process in the Terraform. The official documentation says that:

“The current implementation of Terraform import can only import resources into the state. It does not generate configuration. A future version of Terraform will also generate configuration.

Because of this, prior to running terraform import it is necessary to write manually a resource configuration block for the resource, to which the imported object will be mapped.”
from Terraform Docummentation Import Section

We can have multiple projects, hundreds of datasets, and thousands of tables. Therefore we will create a python script to automate the process.

You can find the source code of this project here

Our agenda will be the following:

  1. What is Terraform?
  2. Environment Setup
  3. Import existing datasets and tables to terraform

What is Terraform?

Terraform is an open-source IAC tool for managing cloud infrastructures. It gives the opportunity of writing human-readable codes for infrastructure operations and version controlling on infrastructure It mainly consists of the “state” concept the provides synchronization. We do not deep into the Terraform architecture here.

Environment Setup

Now, let’s install the terraform and start using it with our BigQuery resources. Requirements are the following.

  • A GCP Project
  • GCP Service Account and Key
  • A Bucket in the Google Cloud Storage
  • Conda
  • Python 3.x

First, we need to install terraform. If you do not use MacOS you can find installing guidelines for other operating systems here.

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

Make sure that the GOOGLE_APPLICATION_CREDENTIALS environment variable points to your service account key json file. If not, you can export with the following command.

export GOOGLE_APPLICATION_CREDENTIALS=<path-to-your-key-json-file>

Note: This is not the best way to use service accounts. You can look at the key rotation options here.

After that, we will create a folder named terraform-bq-projects and create a folder inside with our GCP project name. For example, I have a project named gcp-terraform-test. Inside that project folder, we will create a file named providers.tf

My project structure will be like that.

.
└── terraform-bq-projects
└── gcp-terraform-test
└── providers.tf

We will use our GCP bucket and project name in this file.

Now, we need to get our datasets, and tables for creating .tf files. To do this we have generated a python script with Erkan Ekser one of my colleagues in Trendyol.

Let’s create a folder named importer and create importer.py and environment.yaml files. Our project structure will be like this.

.
└── terraform-bq-projects
└── gcp-terraform-test
│ └── providers.tf
└── importer
├── importer.py
└── environment.yaml

Create a conda environment with our environment.yaml with the following configurations.

conda create env -f=environment.yaml

What about our importer.py file

Run the script with your project.

python importer.py -p <your-project-name>

Let’s examine the code step by step.

  1. The script takes one command-line argument (with -p flag) which is gcp project name datasets and tables are contained.
  2. Traverse over each dataset and table and generates .tf files by calling the writeDatasetTf function
  3. If the resource is table writes its schema JSON under the tables folder, if it’s a view writes its SQL query under the views folder.
  4. Use generatesTerraformImportComand function to generate terraform import command to be applied for each resource.
  5. Finally, we should see a structure something like the following.
── gcp-project
│ ├── datasets
│ │ ├── dataset_1.tf
│ │ ├── dataset_2.tf
│ │ └── dataset_3.tf
│ ├── providers.tf
│ └── schemas
│ ├── dataset_1
│ │ ├── tables
│ │ │ └── table_example_1.json
│ │ └── views
│ ├── dataset_2
│ │ ├── tables
│ │ │ └── table_example_2.json
│ │ └── views
│ └── dataset_3
│ ├── tables
│ └── views
│ └── view_example_1.sql
└── importer
├── environment.yaml
└── importer.py

I have a project name gcp-project, and it has three datasets (dataset_1, dataset_2, dataset_3) that contains table_example_1, table_example_2 tables and view_example_1 view.

The code imported all these resources for us. Now we are up to date with our BigQuery.

Special thanks to Erkan Ekser for all his efforts.

Happy Coding!

--

--