Streaming Data from Multiple Sources Using AirByte (Part 2)

Keren Finkelstein
Israeli Tech Radar
Published in
5 min readFeb 13, 2023

This is the second part in a two-part series. To set up a self-hosted AirByte and a BigQuery destination you should read this part first.

A connection in Airbyte refers to connecting to a specific data source, such as a database or a cloud application, to extract and integrate data into your target data warehouse.

In this part, I will describe how to ingest data from 3 different sources into BigQuery on a self-hosted AirByte.

Adding Connection: Google Sheets → BigQuery

Set up Google Sheets Access

  • Enable Google Sheets and Google Drive API from the API console for the project created before.
1. Go to the API Console.
2. From the projects list, select a project or create a new one.
3. If the APIs & services page isn't already open, open the console left side menu and select APIs & services, and then select Library.
4. Click the API you want to enable. If you need help finding the API, use the search field.
5. Click ENABLE.
  • Create a service account that will generate credentials to access Google Sheets.
1. In the Google Cloud console, go to the Create service account page.
2. Select a Cloud project.
3. Enter a service account name to display in the Google Cloud console.
4. Click Done.
  • Create a key for the created service account and download the JSON file.
1. Click the email address of the service account that you want to create a key for.
2. Click the Keys tab.
3. Click the Add key drop-down menu, then select Create new key.
4. Select JSON as the Key type and click Create.
  • Open the specific google sheet you want to use as a source, click on share, and add the email for the service account created in the previous step.

Set up an Airbyte Google Sheets Source

  • Create a Google Sheet source
  • Enter a source name (unique), select ‘Service Account Key Authentication’, enter the downloaded JSON file content and finally add the Spreadsheet Link

Set up an Airbyte Google Sheets — BigQuery Connection

  • Select the source and destination you created previously.

Adding Connection: AWS S3 → BigQuery

Set up AWS S3 Access

  • On your AWS, go to IAM → Users → Create User. Give the user a name and create. Enter the created user and go to ‘Security credentials’ → Create access key → other → Create access key and save the key and secret in a secure place.
  • Go to IAM → Policies → Create Policy. Give the group a name and attach a policy with written access to S3.
  • Go to User group → create a group, attach the S3 Policy you just created, and choose: Add a user → select the user you created before.

Set up an Airbyte S3 Source

  • Create S3 source
  • Enter a source name (unique), enter a target output name, select the pattern of the files to replicate, and enter the name of the bucket where the files exist, the access key, and the secret of the user created before.

Set up an Airbyte S3 — BigQuery Connection

  • Select the source and destination you created previously.

Adding Connection: AWS MySQL → BigQuery

Set up AWS MySQL Access

  • On Your AWS console, enter EC2 → Security Groups → Create a security group. Enter a valid name and select the VPC where your RDS is located. Add an inbound rule with type → MySQL/Aurora, source → custom → <AirByte VM External IP (on GCP)>/32

Set up an Airbyte MySQL Source

  • Create MySQL source
  • Enter a source name (unique), and enter the database endpoint as host, database username, and password. Select SSL modes → required.

Set up an Airbyte MySQL — BigQuery Connection

  • Select the source and destination you created previously.
  • For CDC select: ‘sync mode’ == ‘Incremental | Append’

Summary

There are several reasons why organizations need to choose Airbyte as their data integration platform:

  • Streamlined data integration: Airbyte simplifies the process of collecting and integrating data from multiple sources, saving time and resources compared to manual methods.
  • Easy connectivity: The platform makes it easy to connect to multiple data sources, extract and transform data, and load it into a target warehouse.
  • Flexible and scalable architecture: Airbyte’s architecture is flexible and scalable, allowing it to accommodate a variety of data sources and use cases.
  • Improved administration and security: By using a service account as the owner of the project, Airbyte simplifies administration and improves security.
  • Integration with other tools: Airbyte integrates with other tools and systems easily, making it a convenient solution for organizations with multiple data sources.

By providing these features and capabilities, Airbyte helps organizations to streamline their data integration processes, improve data quality, and support their business goals.

--

--