How to download Kaggle Datasets into Google Colab via Google Drive

Mohammed Ismail P
Analytics Vidhya
Published in
5 min readJan 17, 2021
Kaggle to Google Colab

Every Data Science and Machine Learning enthusiast have heard of two popular words Kaggle and Google Colab. Let me introduce these words to newbie.

  1. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
  2. Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education.

Most of the new aspirate face difficulty in downloading the datasets from Kaggle to Google Colab. I have found out the easiest way to download the datasets from Kaggle to Colab via Google Drive. Google Drive is used to store datasets for later use by the Colab. Lets get right into it.

Follow the below steps carefully,

Step 1: Create your Kaggle API Token

  • Go to Your Profile and click on Edit Profile.
  • Scroll the page until API section and click on Create New API Token button
Screenshot by Author — Kaggle API section
  • A file named kaggle.json will get downloaded containing your username and token key

Step 2: Upload kaggle.json to Google Drive

  • Create a folder in Google Drive ( in my case I'm using: Kaggle ) where we will be storing our Kaggle Datasets
  • Upload your downloaded kaggle.json file to the created folder
Screenshot by Author — kaggle.json uploaded to Google Drive

Step 3: Open Colab notebook

Open your Google colab notebook or create a new Google colab notebook where you want to use Kaggle datasets

Step 4: Mount Google Drive to Google Colab notebook

  • Run the below script to mount your Google Drive
from google.colab import drive
drive.mount('/content/drive')
Screenshot by Author — Authenticating Google account
  • Click the link to authenticate user Google account
  • Select the respective Google Drive account on which you want to mount and click on sign in
  • Copy and Paste the authentication code into the input cell
  • Congrats! Your Google Drive is mounted,
Screenshot by Author — Google Drive mounted

Step 5: Configure Kaggle

Below code will set the Kaggle configuration path to kaggle.json. Note: If you have used different folder name or directory path for kaggle.json, please use the same instead of /Kaggle in the below code

import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/MyDrive/Kaggle"

Step 6: Change present working directory

  • Below shell command will set the present working directory to,
    /content/drive/MyDrive/Kaggle
%cd /content/drive/MyDrive/Kaggle/

Note: Your Google Drive’s Home directory is at, /content/drive/MyDrive/

Step 7: Download the Kaggle datasets

Now, you can download either normal dataset or competition dataset. Based on your requirements follow the below steps,

Step 7.A: Normal Datasets

  • Go to Kaggle datasets Dashboard and click on Copy API Command as shown,
Screenshot by Author — Titanic dataset Dashboard
  • Your API Command will look like kaggle datasets download -d <username>/<datasets> or kaggle datasets download -d <datasets>
  • Run the command using ! symbol,
!kaggle datasets download -d heptapod/titanic
  • You can check the file using ls command,
Screenshot by Author — Titanic dataset downloaded from Kaggle as zip file

Note: The datasets are downloaded as a zip file. You need to manually unzip the file. But, there is a keyword --unzip used to instantly unzip the file after download and delete the zip file.

!kaggle datasets download -d heptapod/titanic --unzip
Screenshot by Author — Titanic dataset is downloaded and instantly unzipped

Learn more about Kaggle datasets API commands

Step 7.B: Competition Datasets

  • Go to Kaggle competition Dashboard and click on the Data tab as shown,
  • Scroll down the same page. You will see the Kaggle API command. On the right side, there is an icon to copy the command.
  • Your API Command will look like kaggle competitions download -c <username>/<datasets> or kaggle competitions download -c <datasets>
  • Run the command using ! symbol,
!kaggle competitions download -c santander-customer-transaction-prediction
  • One or more zip files might be downloaded based on the competition. Use ls command to view files. In my case, there are three files downloaded as zip file
Screenshot by Author — Competition dataset downloaded from Kaggle as zip file

Note: Unlike normal dataset, you cannot use --unzip keyword to unzip the downloaded zip files even if its not live. You can use python’s built-in package to unzip file. Below is the code to do so,

import zipfilefor file in os.listdir():
if file.endswith(".zip"):
with zipfile.ZipFile(file, "r") as zip_file:
zip_file.extractall()
os.remove(file)
Screenshot by Author — Python code to unzip the zipped files

Learn more about Kaggle competition related API commands

Congratulations!! We have successfully downloaded the datasets from Kaggle. Happy Learning…

--

--

Mohammed Ismail P
Analytics Vidhya

Software Engineer at Zoho | Bachelor in Computer Science | Machine Learning | Deep Learning