Ro.Codes
Published in

Ro.Codes

Integrating Snowflake and Airtable

Airtable is a database-spreadsheet hybrid that also has a number of features to support collaboration and automation, plus integrations with third-party services, making it a powerful tool for business users. In some cases, our team would like to integrate Snowflake and Airtable: extract data from Snowflake (our data warehouse) and load it into Airtable for users to interact with and act on, pull data from Airtable into Snowflake, or both. There’s no native integration between the two services, and we didn’t find an existing tool that worked well for our use case, so we built an in-house tool to transfer data between the two.

We’ve released a version of that tool on GitHub here, and below we’ll discuss how to use it for Snowflake and Airtable automation, as well as a few notes about what it’s up to under the hood.

Usage

There are two commands, one to load data from Airtable into Snowflake, and one to send data from Snowflake to an Airtable table.

To load data into Snowflake, invoke airtable_connector.load with the Airtable base ID and table name, and the qualified name of the destination table in Snowflake:

python -m airtable_connector.load \
--base-id="YOUR_BASE_ID" \
--table-name="YOUR_AIRTABLE_TABLE_NAME" \
--destination="YOUR_QUALIFIED_DESTINATION_TABLE"

To send data to Airtable, invoke airtable_connector.send with the path to a YAML configuration file (discussed below):

python -m airtable_connector.send \
--config-file=path/to/your/config.yml

The configuration file should look like:

airtable_base_id: "YOUR_BASE_ID"
airtable_table_name: "YOUR_AIRTABLE_TABLE_NAME"
tables:
- db.schema.table_1
- db.schema.table_2

Each of the tables (which may also be views) in Snowflake will be queried and any rows returned will be appended to the Airtable table specified.

A slightly more complicated configuration is also supported, where some tables/views provide rows to be inserted, and others may instead specify updates to existing rows:

airtable_base_id: "YOUR_BASE_ID"
airtable_table_name: "YOUR_AIRTABLE_TABLE_NAME"
tables:
- table: db.schema.table_1
update: false
- table: db.schema.table_1_update
update: true

Update-tables (i.e. those configured withupdate: true) must contain a column id, which is the Airtable-assigned ID of the record to be updated. You would most likely obtain this by having loaded the Airtable data into Snowflake and referencing it in your table/view definition.

Implementation notes

When reading a table in chunks, a client may also receive an HTTP 422 Unprocessable Entity error indicating that the server-side iterator is no longer available for some reason and the client needs to restart reading from the beginning, which we handle with similar logic.

We didn’t want this tool to have to explicitly deal with types or generating SQL, so ultimately we call out to pandas.DataFrame.to_sqlto insert the rows.

For our use case, we want the table to be created in Snowflake even if it would be empty, which to_sql does not do, so we use a slightly ugly hack: We generate a DataFrame with the right rows but all null values, create the table containing that one row, and then delete the row.

The downside of doing this, as the comment above notes, is that since Pandas doesn’t have any real data to work with, every column will end up with type string. This is fine for our usage, and the alternative (generating SQL DDL to create the table with the correct types) wasn’t worth the additional complexity.

Conclusion