Snowflake
Published in

Snowflake

From Kaggle to Snowflake

In a previous blogpost I showed how to load .csv-files into Snowflake. I downloaded these files manually from Kaggle. In this post I show you how to make use of the Kaggle API to remove the manual download part.

Install Kaggle

I have used Kaggle in a Anaconda environment. Therefore I have a separate environment in which I installed Kaggle.

Kaggle API key

First we have to create a Kaggle API key which is necessary to connect to Kaggle. If you have a Kaggle account, you can create new API Token from you account settings (https://www.kaggle.com/<user_name>/account).

Standard implementation

Clicking the button above generates a kaggle.json-file. This file needs to be stored in a folder called .kaggle in your home directory. The kaggle.json-file has the following structure:

In the Python-script you can use the OS environment variables directly to authenticate, like presented below:

Customised example

For this example, I was curious whether I could include the kaggle.json content to the Credentials-file I used in my previous example.

Authentication in this customised example goes hand in hand with the authentication to Snowflake. The same Credentials-file is referenced for both Snowflake as well as Kagggle:

Download from Kaggle

Next step is downloading files from Kaggle. For this we reference the Kaggle API, specifically; the dataset_download_files() method

Unzip records

Data from Kaggle is downloaded in .zip-format. You can unzip the files from within the Kaggle API; ‘unzip=True’.

An alternative is to unzip the files via the statement below:

Continuing

The remainder of is similar to the previous post; From .csv to Snowflake.

  • Reading .csv Data
  • Creating Snowflake objects
  • Loading Data into Snowflake

Find the code for this blogpost on Github.

Thanks for reading and till next time.

Daan Bakboord — DaAnalytics

--

--

--

Snowflake articles from engineers using Snowflake to power their data.

Recommended from Medium

hocus.

YFiONE AMA Summary with Decentralized Club ✔️

Building Node.js Barcode Addon with DBR v5.0

Daily Post #310 Less is More

io_submit: The epoll alternative you’ve never heard about

Fixed Income Value-at-Risk with Python

Custom & Vanity URLs with Rails

OCRguessr: making posters into a guessing game

A screenshot show a game in progress. On the left is a poster advertising the Mass Radiography Unit with some letters covered by grey rectangles. On the right there is information about the game in progress: Incorrect guesses remaining, unique characters left to guess, and letters already guessed.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Daan Bakboord

Daan Bakboord

Cloud ☁️ Data & Analytics 📊 Engineer @ DaAnalytics | Manager Data & Analytics @ Pong | Snowflake ❄️ Data Superhero | Modern Cloud ☁️ Data Stack enthusiast

More from Medium

How to Use Snowflake with Tecton

Snowflake Feast Integration

Flow Diagram with offline and online feature stores in Feast

Using Snowpark for Python with Amazon SageMaker

Snowflake Data Clean Rooms with Row Access Policies