A code snippet for setting up a scheduled data connection between Google Sheets and Teradata using Airbyte in Terraform, with a cron schedule to run every hour at the 15-minute mark. The background includes the logos of HashiCorp Terraform, Teradata, and Airbyte.

Manage ELT pipelines with code using Airbyte’s Terraform provider

Janeth Graziani
Teradata

--

Are you a data engineer looking to manage your data pipelines in a sustainable way, keep track of all changes, and eager to automate manual UI interactions? If so, I have a solution for you.

This tutorial explains how to use Terraform to manage Airbyte data pipelines through code. Instead of relying on manual configurations through the WebUI, we’ll use code to create and manage Airbyte resources. The provided example illustrates a basic ELT pipeline from Google Sheets to Teradata Vantage ™ using Airbyte’s Terraform provider, but it can be easily replicated with any of the 300+ data source connectors.

Airbyte interface with a successful “Google Sheets — Teradata” connection listed, set to a cron frequency, and currently enabled.
Airbyte Cloud Connections tab

What is Terraform and Airbyte Terraform Provider?

Promotional graphic for Airbyte Cloud Terraform Provider featuring the airbyte purple octopus mascot next to the Terraform logo, set against a blue geometric background. An example code snippet is partially visible, indicating the setup of an Airbyte provider with version ‘0.1.0’ in a Terraform configuration.
https://airbyte.com/blog/terraform-provider-launched-for-airbyte-cloud

Terraform is an open-source tool for managing Infrastructure as Code (IaC). It enables the automated provisioning and management of infrastructure, cloud platforms, and services via configuration files through code instead of manual setup. Terraform uses plugins, known as Terraform providers, to communicate with infrastructure hosts, cloud providers, APIs, and SaaS platforms.

Airbyte, the data integration platform, has a Terraform provider that communicates directly with Airbyte’s API.

Watch the video tutorial

Prefer videos? Check out our YouTube tutorial instead!

Manage ELT pipelines with code using Airbyte’s Terraform provider

Prerequisites

Install Terraform

First, we need to install Terraform on our local machines. For Mac OS X users, Homebrew is a convenient package manager, for complete download instructions visit the Terraform site.

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

Next, set up your environment by creating a directory for the Terraform configuration. Then, create two files: main.tf and variables.tf inside the directory.

mkdir terraform_airbyte
cd terraform_airbyte
touch main.tf variables.tf

Define a data pipeline

With the setup complete, we can start coding our data pipeline. This involves defining the provider, data source, destination, and connection within the main.tf file. Open this file in Visual Studio Code or your preferred code editor.

Begin your Terraform configuration by declaring the Terraform provider as airbytehq/airbyte and the version you plan on using. We’re going to authenticate via an API key which you can generate and save once logging into the Airbyte Developer Portal.

terraform {
required_providers {
airbyte = {
source = "airbytehq/airbyte"
version = "0.4.1" // Latest Version https://registry.terraform.io/providers/airbytehq/airbyte/latest
}
}
}
provider "airbyte" {
// If running on Airbyte Cloud, generate & save the API key from https://portal.airbyte.com
bearer_auth = var.api_key
}

1. Define a Data Source

Next, select the data source you want to create. In this example, we use the Google Sheets source.

resource "airbyte_source_google_sheets" "my_source_gsheets" {
configuration = {
source_type = "google-sheets"
credentials = {
service_account_key_authentication = {
service_account_info = var.google_private_key
}
}
names_conversion = true,
spreadsheet_id = var.spreadsheet_id
}
name = "Google Sheets"
workspace_id = var.workspace_id
}

Note: The identifier name we assign to the resource as “my_source_gsheets” in the example, is for Terraform’s internal use. The name that goes inside the object, “Google Sheets” is what will appear on Airbyte UI.

You’ll need to authenticate your Google account via OAuth or Service Account Key Authenticator. In this example, we use Service Account Key Authenticator.

2. Define a Teradata Vantage Destination

A destination is also considered a resource and includes a configuration object. The documentation details how to configure a Teradata destination.

# Teradata Vantage Destination Configuration
# For optional parameters visit https://registry.terraform.io/providers/airbytehq/airbyte/latest/docs/resources/destination_teradata
resource "airbyte_destination_teradata" "my_destination_teradata" {
configuration = {
host = var.host
password = var.password
schema = "airbyte_gs"
ssl = false
ssl_mode = {
allow = {}
}
username = var.username
}
name = "Teradata"
workspace_id = var.workspace_id
}

3. Define a Connection

Now that our source and destination are defined, we can proceed to define the connection between Google Sheets and our Teradata database.

# Connection Configuration
resource "airbyte_connection" "googlesheets_teradata" {
name = "Google Sheets - Teradata"
source_id = airbyte_source_google_sheets.my_source_gsheets.source_id
destination_id = airbyte_destination_teradata.my_destination_teradata.destination_id
schedule = {
schedule_type = "cron" // "manual"
cron_expression = "0 15 * * * ?" # This sets the data sync to run every 15 minutes of the hour
}
}

Note that this example uses a cron expression to schedule the data transfer to run every 15 minutes past the hour.

In our main.tf file, we reference variables which are held in the variables.tf file. These include the API key, workspace ID, Google Sheet id, Google private key, and Teradata Vantage credentials. Copy the following template into the variables.tf file and populate with the appropriate configuration values in the default attribute.

4. Configure the variables.tf file

#log in to https://portal.airbyte.com generate, save and populate the variable with an API key
variable "api_key" {
type = string
default = ""
}
#workspace_id is found in the url to the Airbyte Cloud account https://cloud.airbyte.com/workspaces/<workspace_id>/settings/dbt-cloud
variable "workspace_id" {
type = string
default = ""
}
#Google spreadsheet id and Google private key
variable "spreadsheet_id" {
type = string
default = ""
}
variable "google_private_key" {
type = string
default = ""
}
# Teradata Vantage connection credentials
variable "host" {
type = string
default = ""
}
variable "username" {
type = string
default = "demo_user"
}
variable "password" {
type = string
default = ""

Execution Commands

Run terraform init to pull down the provider plugin from the Terraform provider page and initialize a working Terraform directory.

This command should only be run after writing a new Terraform configuration or cloning an existing one from version control.

Visual Studio Code editor with Terraform configuration files, where a successful ‘terraform init’ command has been executed in the terminal, initializing the Terraform backend and provider plugins.
Terraform init command installs provider plugins

Run terraform plan to display the execution plan Terraform will use to create the resource and make modifications to the infrastructure. For this example, a plan for 3 new resources is created:

Connection: # airbyte_connection.googlesheets_teradata will be created
Destination: # airbyte_connection.googlesheets_teradata will be created
Source: # airbyte_source_google_sheets.my_source_gsheets will be created

Visual Studio Code interface with the terminal output of a ‘terraform plan’ command showing a planned creation of an Airbyte connection between Google Sheets and Teradata.
Terraform plan command creates an execution plan

Run terraform apply and yes to generate a plan and carry out the plan.

Visual Studio Code running terraform apply and confirmation to generate a plan and carry out the plan.
terraform apply and yes generate a plan and execute the plan

The terraform.tfstate file is created after running terraform apply for the first time. This file tracks the status of all sources, destinations, and connections managed by Terraform.

Visual Studio code with terraform.tfstate file open. This file is created after running terraform apply for the first time. This file tracks the status of all sources, destinations, and connections managed by Terraform.
terraform.tfstate file tracks the status of all resources

For subsequent executions of terraform apply, Terraform compares the code in the main.tf file with the code stored in the .tfstate file. If resources are added or removed in main.tf, Terraform automatically updates both deployment and the .tfstate file accordingly upon deployment. Do not modify this file by hand.

Congratulations! You now have a Source, Destination, and Connection on Airbyte Cloud created and managed via Terraform.

Airbyte interface with a successful “Google Sheets — Teradata” connection listed, set to a cron frequency, and currently enabled.
Airbyte Cloud Connections tab

Additional Resources

Use Airbyte to load data from external sources to Teradata Vantage
Transform data Loaded with Airbyte using dbt
Airbyte API reference documentation.
Terraform Airbyte Provider Docs

Questions and comments

We’d love to hear from you! Please leave any comments in the comment section below. We encourage you to leverage the extensive resources offered on the Teradata Developer Portal and the Teradata Developer Community.

About the author

Janeth Graziani is a Developer Advocate at Teradata who enjoys leveraging her expertise in technical writing to help developers navigate and incorporate new tools into their tech stacks. Connect with Janeth on Linkedin!

--

--