Google Colaboratory and Kaggle datasets
I recently started using Kaggle datasets in Google Colab, and thought I would share how I do it. So here is my first post!
Step 1: Get Kaggle API Token
Login to your Kaggle account and under “My Account”, navigate to “Create New API Token”. Click the button to download your API token as a json file.
Step 2: Install Kaggle library and Import Google Collab File Library
Use the following code in your Google Collab Notebook.
Step 3: Upload Kaggle API json file to Google Colab
The code below will prompt you with a button to upload files to Google Colab. Use this to navigate to the location of the downloaded Kaggle API Token json file and upload it.
PS: You could use this to upload files directly from your local machine to the notebook!
Step 4: Download dataset from Kaggle
Now, go to the kaggle competition dataset you are interested in, navigate to the Data tab, and copy the API link and paste in Colab to download the dataset.
Note: As you can tell, just replace the competition name in the code (after download -C) with the competition dataset you are interested in.
You should see something like this on your Colab notebook:
Step 5: Unzip datasets and load to Pandas dataframe
Finally, let’s load the the datasets into pandas. Since the train and test datasets are in .zip format, we will need to unzip them before reading from the .csv files. Luckily for us, Pandas does it all!
This is how it should look like on your Google Colab notebook:
Now, on you go with your exploratory analysis :)
I hope you found this helpful. Please let me know in the comment’s below if I have missed anything or have recommendations for my writing style :)
I am looking forward to sharing more articles and growing with you all.