Tutorial: Kaggle API + Google Colaboratory

Yvette

Don’t want to download large Kaggle datasets to your local machine and upload them to your Google Drive? Here is a tutorial about how to connect Kaggle API on Google Colaboratory and download datasets directly from Kaggle to your Colab without the time-consuming procedure. ✧୧(๑=̴̀⌄=̴́๑)૭✧


Step1. Get Your Kaggle Token:

Click top right corner and go to your kaggle My Account page.

My Account

Scroll down to API section and click Create New API Token button:

Create New API Token

It will download a file called kaggle.json. Store it wisely (ง •̀_•́)ง and we will use it later.

Open the file and it should be in this format:

{“username”:”YOUR-USER-NAME”,”key”:”SOMETHING-VERY-LONG”}

Step2. Colaboratory Runtime:

Go to Google Colaboratory, open a New Python 3 Notebook. To verify your Python version and (optional) to use the fancy GPU, on the top left tool bar, click Runtime. On the bottom of the menu, click Change runtime type.

Change runtime type

Select Python 3 and GPU, click Save.

Change hardware accelerator to GPU

Step3. Install Kaggle API:

Installation should be the same as using Jupyter Notebook. In the cell, type and run:

!pip install kaggle

After the installation is completed, use !ls -a to check if you have a directory called .kaggle under the content, if not, make one:

!mkdir .kaggle

Go back to the .json file you downloaded in step1, copy it. In next cell, type and paste (no exclamation mark):

import jsontoken = {“username”:”YOUR-USER-NAME”,”key”:”SOMETHING-VERY-LONG”}with open(‘/content/.kaggle/kaggle.json’, ‘w’) as file:
json.dump(token, file)

In the next cell, run:

!chmod 600 /content/.kaggle/kaggle.json

Update (Sep 2018):

Add this line of code before configuration if you get missing username error (Here is where I found the way to solve this problem https://stackoverflow.com/questions/51958553/error-while-importing-kaggle-dataset-on-colab)

# may run into problems, see updated version below
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json

Update (Jan 2019):

I recently found out that the ~/.kaggle is no longer working, instead you will need to use the following code:

!cp /content/.kaggle/kaggle.json ~/root/.kaggle/kaggle.json

Then run:

!kaggle config set -n path -v{/content}

Step4. Download Data:

Go to the Kaggle competition page you would like to download data from , and browse to Data, I’m using Home Credit Default Risk competition as an example:

Scroll down to the data section and click API button, it will copy the command automatically.

Paste the command into Colab’s cell (don’t forget the exclamation mark). Add the -p to specify your path.

!kaggle competitions download -c home-credit-default-risk -p /content

Your output should look somewhat like this:

Download output

To unzip the files, run the following command:

!unzip \*.zip

Now your data is available to use. Try:

import pandas as pdd = pd.read_csv('application_train.csv')d.head()

Have Fun! ᕕ( ᐛ )ᕗ


Some miscellaneous things:

  1. In order to download the Kaggle competition data, you need to join the competition and accept the rules on Kaggle first. If not, the data downloading step may throw errors at you. <-biubiu-⊂(`ω´∩)
  2. I’m a Mac user so if you copy and paste the code on Windows, you may need to modify the quotation marks to avoid formatting issues.
  3. If for any reason you can no longer access your .kaggle directory, try run: !rm .kaggle to remove the directory and restart from the !mkdir .kaggle step should resolve the problem.
  4. Suggestions and Advises are welcomed.

Yvette

Written by

Yvette

Fly through the starry night.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade