Integrating Snowflake and Airtable
Airtable is a database-spreadsheet hybrid that also has a number of features to support collaboration and automation, plus integrations with third-party services, making it a powerful tool for business users. In some cases, our team would like to integrate Snowflake and Airtable: extract data from Snowflake (our data warehouse) and load it into Airtable for users to interact with and act on, pull data from Airtable into Snowflake, or both. There’s no native integration between the two services, and we didn’t find an existing tool that worked well for our use case, so we built an in-house tool to transfer data between the two.
We’ve released a version of that tool on GitHub here, and below we’ll discuss how to use it for Snowflake and Airtable automation, as well as a few notes about what it’s up to under the hood.
To start, you will need accounts with both Snowflake and Airtable. Then, generate an Airtable API key and export it and your Snowflake credentials as environment variables, as explained in the readme.
There are two commands, one to load data from Airtable into Snowflake, and one to send data from Snowflake to an Airtable table.
To load data into Snowflake, invoke
airtable_connector.load with the Airtable base ID and table name, and the qualified name of the destination table in Snowflake:
python -m airtable_connector.load \
To send data to Airtable, invoke
airtable_connector.send with the path to a YAML configuration file (discussed below):
python -m airtable_connector.send \
The configuration file should look like:
Each of the tables (which may also be views) in Snowflake will be queried and any rows returned will be appended to the Airtable table specified.
A slightly more complicated configuration is also supported, where some tables/views provide rows to be inserted, and others may instead specify updates to existing rows:
- table: db.schema.table_1
- table: db.schema.table_1_update
Update-tables (i.e. those configured with
update: true) must contain a column
id, which is the Airtable-assigned ID of the record to be updated. You would most likely obtain this by having loaded the Airtable data into Snowflake and referencing it in your table/view definition.
One important thing to note about the Airtable API is that you’re limited to 5 requests per base per second. Each request can either read 100 records, or write 10 records. We automatically stay under that limit, and retry a few times with backoff if we receive a
429 Too Many Requests response, but this does limit the size of tables you can effectively transfer using this method, and you could easily break the limit if you have multiple clients/processes using a base via its API at the same time.
When reading a table in chunks, a client may also receive an
HTTP 422 Unprocessable Entity error indicating that the server-side iterator is no longer available for some reason and the client needs to restart reading from the beginning, which we handle with similar logic.
We didn’t want this tool to have to explicitly deal with types or generating SQL, so ultimately we call out to
pandas.DataFrame.to_sqlto insert the rows.
For our use case, we want the table to be created in Snowflake even if it would be empty, which
to_sql does not do, so we use a slightly ugly hack: We generate a DataFrame with the right rows but all null values, create the table containing that one row, and then delete the row.
The downside of doing this, as the comment above notes, is that since Pandas doesn’t have any real data to work with, every column will end up with type
string. This is fine for our usage, and the alternative (generating SQL DDL to create the table with the correct types) wasn’t worth the additional complexity.
If your organization is using Airtable internally, and you’d benefit from the ability to integrate Snowflake and Airtable, give it a try and let us know what you think!