The Easy Approach to Access a Kaggle Dataset in Google Colab — Machine Learning Mindset

amirsina torfi
Machine Learning Mindset
4 min readMar 15, 2020

--

This tutorial aims to show you a very easy and straightforward approach to import a Kaggle dataset into Google Colaboratory environment. Here, you will learn:

  • What is Kaggle?
  • What is Google Colaboratory?
  • Basics of working with Google Colaboratory
  • How to access, download, and import and Kaggle dataset in Google Collaboratory

About Kaggle

Kaggle is one of the most famous platforms to enroll in a competition associated with Machine Learning and Data Science projects. Kaggle is a place that is where Machine Learning experts gather together to shine! On Kaggle, by reading and doing, you will learn:

  • Data processing
  • Model deployment
  • Evaluation
  • Interesting ideas about the subject
  • Implementation tweaks

Those three elements are the basics of any Machine Learning project. However, you should NOT expect that on Kaggle you learn everything about real-world projects! In fact, in real-world projects, (1) you usually have not the data clean and ready which is the usual case in Kaggle competitions, (2) you may frequently have to define the problem, something others care about the solution, (3) create a customized framework, (4) evaluate in creative ways, not the regular everyday used metrics! You do NOT learn those with Kaggle, most probably!

About Google Colaboratory

Google Colaboratory is a free platform ( environment might be a better word though!) for programmers to do coding! Beyond that, it is in essence developed to facilitate the Machine Learning and Deep Learning research by providing free GPU resources! In Google Colab:

  • You have access to a free GPU with limited runtime!
  • You write your code in a nice ready-to-use notebook.
  • Installing new packages is very easy!

Google Colab basically provide descent computation resources for whoever around the world that (1) desire to do Machine Learning and (2) have a Gmail account!

Procedure to Access the Kaggle Dataset

At first, you should go to your account and create a new API token. Do the following in order:

  1. Go to your Kaggle account
  2. Find the API section
  3. Push the Expire API Token button (Kaggle notification: Expired all API tokens for Your Name)
  4. Push the Create New API Token button ( Kaggle notification: Ensure kaggle.json is in the location ~/.kaggle/kaggle.json to use the API. ) and it downloads the “kaggle.json” file.

Now you can go to Google Colaboratory. Open a new file in any directory you desire and do the following:

1. Install the Kaggle!

# Install Kaggle API !pip install --quiet kaggle

The -quiet argument prevents Colab to output the installation details and is usually created in the output.

2. Load the token JSON file

# Choose kaggle.json that created for new API token in your account from google.colab import files files.upload()

3. Create a DIR and copy the kaggle.json file there

When we created the new API token, Kaggle says “Ensure kaggle.json is in the location ~/.kaggle/kaggle.json to use the API.”

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
# Check the file in its new directory
!ls /root/.kaggle/
# Check the file permission
!ls -l ~/.kaggle/kaggle.json
#Change the file permission
# chmod 600 file – owner can read and write
# chmod 700 file – owner can read, write and execute
!chmod 600 ~/.kaggle/kaggle.json

Above, we used “!mkdir -p folder-name” command. Why we used “-p” option? Building a folder which potentially may have multiple subdirectories requires adding the “-p” option. This ensures “mkdir” adds any parent directories. Assume we want to create a structure as “!mkdir -p folder1/folder2/folder3”, then “-p” option is needed. In our example, -p was NOT required but its a better practice to always use “-p” option.

3. Download the Dataset

Now, we go ahead and download the dataset. But how? First, let’s go to the data panel:

Then, we go down in the page and find the API download command:

Click and it copies the command. Then, go ahead and download it with the following python commands in the Google Colab:

# Get the dataset we want by !kaggle competitions download -c 'name-of-competition' 
!kaggle competitions download -c nlp-getting-started
# For unzip you can use the following
#!mkdir folder_name
#!unzip anyfile.zip -d folder_name

As above, if the data is in zip format, you can simply unzip it and place it in a folder!

Conclusion

In this tutorial, you learn how to download and import a Kaggle dataset into Google Colaboratory. Doing so makes your life very easy as the majority of the Machine Learning projects on Kaggle require GPUs and you get free GPU access in Google Colab! The combination of Kaggle and Google Colab in an elegant way is an approach that makes you superior in Machine Learning competitions and projects.

Originally published at https://www.machinelearningmindset.com on March 15, 2020.

--

--