How to access datasets directly from Kaggle

Hiren Rupchandani
Accredian
Published in
4 min readApr 29, 2022

Preface

Kaggle is one of the largest data science community platforms that provides access to various datasets, competitions, resources, and powerful tools to practice data science and machine learning.

Owner: Kaggle.com
  • Kaggle allows us to use its datasets by downloading them or by using its API.
  • In this article, we will be looking at the latter part where we can simply use the API key provided to us by Kaggle.com which can be stored anywhere on your Google drive.

Prerequisites

To follow through with this article, you need to have a Kaggle account (to generate the API key) and a Google account (to use Google Colab)

Generating the API Key

To generate the Kaggle API Key, follow the given steps:

  1. Login to your kaggle.com account
  2. On the top right corner, you can see your profile. On clicking it, you will see an option to view Your Profile, Account Settings, or Logout. Click on the Account Settings (indicated by the Gear icon).
Going to your Kaggle Account
  1. On your account page, you can scroll down till you see an API section. In this section, you can see a Create New API Token button. Click on it.
Generating API Key
  1. You will be given a JSON file named kaggle.json that contains the API Key that is private only to your account and must not be shared.
  2. You need to store this API key in a folder named .kaggle as the API’s library by default searches for this on your local system.

Setting things up

  • In this article, I will be showcasing how to access the token through google drive.
  • Before running the required scripts, you first need to upload your kaggle.json file on Google Drive.
  • Meanwhile, you can create a new collab notebook to keep up with this article.
  • After you have uploaded the file, you need to mount your drive storage on your new collab notebook using the following command:
drive.mount('/content/drive')
  • You will be prompted to give access to your drive storage by selecting your account and authenticating using a key.
Authorizing your Connection to Google Drive
  • Now that you have mounted your drive, we can download and import all the necessary libraries on this collab instance.
  • Starting with the required libraries, we will first install kaggle and kaggle-cli libraries using the following commands:
!pip install -q kaggle
!pip install -q kaggle-cli
  • Now, you need to run the below script that creates a folder named .kaggle on your drive, copies the kaggle.json file in it, and modifies the access such that only you can access and read the kaggle.json file:
!mkdir -p ~/.kaggle
!cp "/content/drive/MyDrive/kaggle.json" ~/.kaggle/
!cat ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json
  • The output should be your kaggle username and your API Key and we are set to download the datasets.

Accessing a publicly available dataset

  • To download the dataset here, you need to copy the URL after kaggle.com i.e. username of the uploader and the dataset name they have uploaded.
  • And the required command will be in the form:
!kaggle datasets download -d username/dataset_name
!kaggle datasets download -d nicholasjhana/energy-consumption-generation-prices-and-weather
  • You can see the download progress and later check that the files are visible on the left side of your collab interface.
Example of downloading a dataset
  • But the data is in a zip file. You can extract the contents using the following command:
!unzip /content/energy-consumption-generation-prices-and-weather.zip
  • You can now use the pandas library to check the data.

Accessing a Competition dataset

  • The procedure is the same except that you first need to terms and conditions of the said competition.
  • To download the dataset here, you need to copy the URL after kaggle.com i.e. the competition name.
  • And the required command will be in the form:
!kaggle competitions download -c competition_name
!kaggle competitions download -c tabular-playground-series-feb-2022
  • Again the file is in zipped format but you can unzip it using the !unzip command.

Conclusion

  • And that’s it…
  • You can access the notebook that I have created for your reference here.
  • All you need to do is generate and upload your API key on your google drive before running the above notebook.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science because this one will cover your foundations plus machine learning algorithms (basic to advance).

--

--