How to download datasets from Kaggle to Google Colab

Use Google Colab for Data Analysis

Fangyi Yu
Geek Culture
3 min readJul 22, 2021

--

Photo by Myriam Jessier from Unsplash

Introduction

In this tutorial, you will download datasets from Kaggle API to Google Colab.

Kaggle is an online community of data scientists and machine learning practitioners where users can find and publish datasets, explore and build models in a web-based data-science environment.

Google Colab allows anybody to write and execute python code through the browser. It also provides free GPU and TPU services.

At the end of this tutorial, you will have a Fake News Detection dataset downloaded to your Google Colab, and you can download any dataset you wish in the Kaggle API using the same method.

Prerequisites

To complete this tutorial, you will need:

  • Install and set up a local programming environment for Python3.
  • Have a Kaggle account and a Google Colab account.

Step 1 — Fetch the Kaggle API token.

Log in to your Kaggle account, click on your profile, go to your account, scroll to the API section and click Expire API Token to remove previous tokens. Don’t forget this step, otherwise, you may encounter a 401 — unauthorized error when downloading datasets.

Kaggle API token

Step 2 — Download Kaggle.json to Google Colab.

  1. First, create a folder named Kaggle in your Google Colab. Feel free to use any folder name. I am using Kaggle for better follow-up.
  2. Second, click on Create New API Token in your Kaggle account, and download the kaggle.json file to the Kaggle folder you created in Google Colab.

Step 3 — Mount Drive to Colab notebook.

  1. First, create a new Colab notebook and install Kaggle using the following code.
! pip install kaggle

2. Second, use the code below to mount your google drive:

from google.colab import drive
drive.mount('/content/gdrive')

3. Then get your authorization code using the URL prompted and provide it in the empty box as shown in the figure:

Colab authorization

4. Run the following code to provide the config path to kaggle.json.

import osos.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/My Drive/Kaggle"

Step 4 — Change directory and download dataset.

  1. Change to the Kaggle directory using:%cd /content/gdrive/My Drive/Kaggle
  2. Find the dataset you want to download in Kaggle. For example, suppose you will perform a Fake News Detection Challenge on Kaggle and want to download the dataset in Colab. In that case, you can navigate to the dataset website and click Data, then copy the API command as follows.
Dataset API

In this case, the API command is kaggle competitions download -c fakenewskdd2020.

3. Run the following command to download the dataset in Colab: !kaggle competitions download -c fakenewskdd2020 The dataset is now downloaded to your Kaggle directory.

4. Unzip the file and delete the zip file using: !unzip \*.zip && rm *.zip.

Sweet! The dataset is ready for you to play with!

Conclusion

In this article, you downloaded a Fake News Detection dataset from Kaggle API to Google Colab. Now you can download any dataset you want from Kaggle API and play around with your data!

--

--