Google Colaboratory and Kaggle datasets

“An over-the-shoulder shot of a person writing code on a laptop” by Tirza van Dijk on Unsplash

I am a big fan of using Google Colaboratory for machine learning projects, especially with the free GPU. If you don’t know what Google Colab is, I recommend reading this amazing article here.

I recently started using Kaggle datasets in Google Colab, and thought I would share how I do it. So here is my first post!

Step 1: Get Kaggle API Token

Login to your Kaggle account and under “My Account”, navigate to “Create New API Token”. Click the button to download your API token as a json file.

Navigate to your account, using the profile pic on the top right.
Click the “Create New API Token” button to download API Token as a json file.

Step 2: Install Kaggle library and Import Google Collab File Library

Use the following code in your Google Collab Notebook.

Step 3: Upload Kaggle API json file to Google Colab

The code below will prompt you with a button to upload files to Google Colab. Use this to navigate to the location of the downloaded Kaggle API Token json file and upload it.

PS: You could use this to upload files directly from your local machine to the notebook!
Execute the above code, select the json file and upload it. Use “!ls” to see all the files in the directory.

Step 4: Download dataset from Kaggle

Now, go to the kaggle competition dataset you are interested in, navigate to the Data tab, and copy the API link and paste in Colab to download the dataset.

NYC Taxi Trip Duration Competion on Kaggle. Copy the command in the API box and execute in Colab.
Here is the code for NYC Taxi Trip Duration Challenge.

Note: As you can tell, just replace the competition name in the code (after download -C) with the competition dataset you are interested in.

You should see something like this on your Colab notebook:

NYC Taxi Trip Duration dataset downloaded from Kaggle. Notice how I use “!ls” to list all the files in my noteboook.

Step 5: Unzip datasets and load to Pandas dataframe

Finally, let’s load the the datasets into pandas. Since the train and test datasets are in .zip format, we will need to unzip them before reading from the .csv files. Luckily for us, Pandas does it all!

This is how it should look like on your Google Colab notebook:

Data successfully imported into pandas dataframe.

Now, on you go with your exploratory analysis :)

Final Thoughts

I hope you found this helpful. Please let me know in the comment’s below if I have missed anything or have recommendations for my writing style :)

I am looking forward to sharing more articles and growing with you all.