Importing Semi-Structured Data from Google Sheets to Snowflake

John Mastro
Apr 8, 2019 · 3 min read
Image for post
Image for post


We published a small tool you can use to import semi-structured data from Google Sheets to Snowflake, taking advantage of Snowflake’s variant type.


There are a variety of cases where you might want to import data provided by business users into your data warehouse — for example, inventory cost information or logs of marketing spend. These probably won’t come directly from your application, or from a third-party integration, but rather from the operations team, the finance team, the growth team, etc.

If your organization uses Google Sheets a lot, that’s probably a fairly convenient way for business users and input and manage the data.

So, why not just pull the information directly from a sheet? One possible challenge is that these sheets are likely to change over time, with columns being added and removed. We can take advantage of Snowflake’s support for semi-structured data, in particular the variant type, to keep things flexible.

Usage example

Suppose you have a Google Sheet, accessible via a service account, that

looks something like this:

Image for post
Image for post

Using this module, the result of importing the sheet above would look like this:

Image for post
Image for post

By default, all items are imported as strings, but a “ — coercions” argument is supported to specify fields that should be interpreted as numbers, dates, and timestamps.

You can then use Snowflake’s support for querying semi-structured data to work with the “data” object conveniently.

The repository has additional information on initial setup and supported options.


The most important limitation is that this tool replaces the entire table every time, so if historical information is removed from the sheet, it will also be removed from the database.


While this is a convenient technique, it won’t be appropriate for every situation. We also import data into traditional tables, import from CSVs, and import similarly but in an append-only fashion, some or all of which we’ll likely expand on in future commits and articles.

In terms of implementation strategy, you could export CSVs to, say, S3, and then import them from there. Or use an ETL/sync tool, some of which support reading from Google Sheets or files stored in Google Drive or an SFTP server. Yet another option would be to use something like Airtable.

Whichever approach you take, we hope this is an interesting example of dealing with semi-structured data from heterogeneous sources while taking advantage of non-relational features in databases.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store