How to download datasets, folders, and single files from Kaggle to Google Drive

Nicholas Hale
5 min readJun 16, 2022

--

Some Kaggle datasets are huge, I’ve recently been playing around with one over 100GB. It makes no sense for me, or anyone really, to be downloading this locally just to upload it to Google Drive. Luckily, there are ways to do this quickly and easily on Google Colab.

The approach differs depending on whether you want to store the whole dataset, a folder from the dataset, or just a single file. If you are downloading the complete dataset or a folder, you’ll have to unzip the files once downloaded, I’ll show you how to do this at the bottom of the page.

Two important things to note:

  • Drive does not sync with Colab straight away, you may need to leave Colab open for 5 or so minutes after downloading the data for it to show in your Drive.
  • Colab only allows a limited amount of disk storage (~60GB for free users and ~180GB for Pro members). Even though the data is being stored on Drive, Colab stores temporary files while running. So if you are downloading more than allowed you may need to pay for more disk space. That, or download folders or files in separate runtimes.

The very first thing to do is sign up to Kaggle (if not already), and then jump on Colab and mount your Drive.

Mounting Drive in Colab

This can be done in two ways:

  • Through the user interface, simply click the “Mount Drive” button in the “Files” sidebar, and follow the prompts to allow access.
Hit the circled button to mount your google drive.
  • Or, by using the following snippet of code in a Colab cell, and again follow the prompts to grant access.
from google.colab import drivedrive.mount('/content/drive')

Once your Drive is mounted in Colab it’s time to download your data!

Downloading Complete Dataset or Single File

On Kaggle, head to your account settings and find the API section. Here you need to click the “Create New API Token” button to download the JSON file containing your Kaggle key.

Then, open the JSON and copy your 32-digit key.

In a new Colab cell, copy and paste the below code, replacing the X’s with your Kaggle username and key, and run the cell.

import osos.environ['KAGGLE_USERNAME'] = "XXXXXXXX" 
# username from the json file
os.environ['KAGGLE_KEY'] = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
# key from the json file

Next, head to the “data” page of the Kaggle data you wish to download and copy the API command (circled in red):

Full dataset download

Now, in a new cell, paste the API command, prefixed by an “!” and run cell to download the complete dataset to the current working directory in Drive. You can also specify the download path with “-p” as shown below.

!kaggle competitions download -c siim-isic-melanoma-classification -p /content/drive/MyDrive/kaggledata

Single file download

Downloading a single file is essentially the same as the whole dataset, we just need to specify which file to download with an “-f”. Here we’ll be downloading the “test.csv” file.

!kaggle competitions download -f test.csv -c siim-isic-melanoma-classification -p /content/drive/MyDrive/kaggledata

Once the cells have run, and the data has been downloaded, you will see the files appear in the “Files” sidebar (can take 10–20 seconds).

Downloading a folder from Kaggle

Kaggle doesn’t actually allow you to download a single folder using their API, like the “jpeg” folder shown above, so here I’ll show you a workaround.

We first need to install wget, “web get”, on Colab. In a new cell:

!pip install wget

Once installed, you can run the !pip show wgetcommand in a new cell to ensure correct installation. If all is well and good, time to move on to the (slightly) tricky part.

On the Kaggle data page, click on the folder you want to download and then on the download icon to the right:

Head to the downloads page of your browser and pause the download. This may sound counterintuitive, but remember we don’t want to download the folder locally. All we need here is the link address.

Back in Colab, import wget and paste in the following lines of code, replacing the URL with the copied link address, and specify the path with your wanted destination:

import wget# Copied link address from your paused download
URL = "https://storage.googleapis.com/kaggle-competitions-data/kaggle-v2/20270/1222630/compressed/jpeg.zip?GoogleAccess...."
wget.download(URL, '/content/drive/My Drive/kaggledata/jpeg.zip')

Run the cell, and wait for the download to finish.

Unzipping files in Drive

Now that you’ve successfully downloaded the files/folders to Drive, chances are you’ll have to unzip them first before you can use them. This is easy enough and can be done in a single line of code. The unzipped files will be saved onto your Drive.

!unzip "/content/drive/MyDrive/kaggledata/jpeg.zip" -d "/content/drive/MyDrive/unzippedkaggledata/"

The unzipping process can take a few minutes, depending on the size of your data. And again, it may take 5 or so minutes for the files to appear in your Drive after running.

And that’s it. Now that you’ve successfully downloaded, unzipped and stored your data on Drive, you can get on analysing and modelling! Just don’t forget to re-Mount your drive whenever you want to access them.

Accessing the files is easy, by the way, once your drive is mounted. See below:

df = pd.read_csv("/content/drive/MyDrive/kaggledata/test.csv")

The Kaggle API docs can be found here.

Happy modelling!

--

--