Download any Dataset from Kaggle with Kaggle API and python

Mine Gazioğlu
4 min readApr 10, 2023

--

Kaggle is an amazing platform for data scientists and machine learning enthusiasts, where various datasets are accessible for different projects. Kaggle provides an easy way to download datasets by using Kaggle API. Downloading a dataset from Datasets section and a competition dataset from Competitions section requires different methods when using Kaggle API. In this article, how to download a dataset and a competition dataset from Kaggle using Kaggle API and python will be demonstrated.

First, create an API token by clicking Account section under Profile. Then, click Create New API Token under that.

After clicking Create New API Token, a json file, kaggle.json, will be downloaded. This file contains username and kaggle key information in the following format:

{"username":"user-name","key":"kaggle-key"}

Ensure to install kaggle library, which is used for interacting with Kaggle platform and its API:

!pip install kaggle

Next, you need to authenticate the Kaggle API with your credentials. You can set your Kaggle username and API key using the os library in your environment.

import kaggle as kg
import pandas as pd
import os

os.environ['KAGGLE_USERNAME'] = 'user-name'
os.environ['KAGGLE_KEY'] = 'kaggle-key'

kg.api.authenticate()

We have come to a crossroads in our journey through Kaggle. Two paths lie ahead of us, one leads to Datasets section and the other to Competitions section. Which path will you take?

Downloading a Dataset from Datasets Section

For this example, I will use Online Retail Data Set . You can click three-dot menu on the upper right and select Copy API command.

API command contains the dataset name. You need to specify the dataset name and the zip file name in which you want to download the dataset. You can use the kg.api.dataset_download_files() method to download the dataset.

#kaggle datasets download -d vijayuv/onlineretail # Copied API command, take the dataset name from this command

kg.api.dataset_download_files(dataset = "vijayuv/onlineretail", path='on.zip', unzip=True)

After downloading, you can read the csv file using the pd.read_csv() function.

df = pd.read_csv('on.zip/OnlineRetail.csv', encoding='ISO-8859-1')
df.head()

Downloading a Dataset from Competitions Section

Titanic dataset, which is one of the most notorious competitions in Kaggle history, will be our example competition for the task of downloading a dataset from competitions section. Click the link of the competition and go to Data section:

There is information related to the dataset under Data section, such as train and test files, and feature explanations. Scroll down to the bottom of the page, and there, you will encounter the API command to download the dataset:

In the command “kaggle competitions download -c titanic”, the word “titanic” is used to tell which competition to download from Kaggle. The ‘-c’ is just a flag that tells the ‘download’ command that the value that follows it is the name of the competition to download. So ‘titanic’ is the name of the competition that we want to download. We will use the api.competition_download_files method to download the dataset:

kg.api.competition_download_files(competition = 'titanic', quiet = False)#replace the competition parameter with the name of your dataaset

Now you should have titanic.zip folder in your current working directory. You can read the files in the zip folder like this:

zf = zipfile.ZipFile('YOUR PATH/titanic.zip') 
submission = pd.read_csv(zf.open('gender_submission.csv'))
test = pd.read_csv(zf.open('test.csv'))
train = pd.read_csv(zf.open('train.csv'))

In this article, how to download a dataset from Kaggle using Kaggle API and python is demonstrated. We have seen that there are 2 paths we can download datasets, either from Competitions or Datasets sections. Kaggle provides a user-friendly interface for downloading datasets which makes it easy for data scientists and machine learning enthusiasts to find and download datasets for their projects.

--

--