Importing Datasets in Google Colab

Amit Kumar
3 min readFeb 7, 2019

--

Have you ever had trouble with setting up Jupyter Notebook on your computer, managing different versions of python and anaconda? While this can be managed easily, at times it causes troubles.

Recently I stumbled upon Google Colab. Google Colab does everything that your Jupyter Notebook does and a little more. The little more part is that, you can use GPU and TPU for free.

In this article I’ll show you how you can use your dataset in the Colab notebook.

I’ll cover two methods here and list some methods that can also be used.

Uploading dataset from local filesystem

This option will create a “Choose File” button in your notebook, using which you can upload your dataset to the notebook’s runtime. This can be useful for small datasets.

Step 1: Uploading the files

from google.colab import files
uploaded = files.upload()

This will add a “Choose Files” button and you can upload your dataset. files.upload returns a dictionary of the files which were uploaded. The dictionary is keyed by the file name, the value is the data which was uploaded. Note that the uploaded file object is encoded so we have to decode it before use.

Here I’ll use show how to use pandas to store the uploaded csv file into a DataFrame.

import pandas as pd
import io
train_data = pd.read_csv(io.StringIO(uploaded['train.csv'].decode('utf-8')))

Here is a screenshot of my Colab notebook

Uploading files from local filesystem

Uploading dataset from GoogleDrive

We can import datasets uploaded on Google Drive in a number of ways:

I’ll focus on the third method here. The following code will mount the drive. It will ask for a authorisation code which you can get here, after signing into your Google account.

from google.colab import drive
drive.mount('/content/gdrive')

To go to your drive’s main directory(the one that is visible when you open Google Drive) cd into “/content/gdrive/My Drive/”

Note: You can use commands like — cd, ls, pwd in Colab Notebooks.

I’ll show how to add dataset that is uploaded in drive to the Colab Notebook.

I’ve uploaded my files on Google Drive:

I’ll import test.csv using pandas.

test_data = pd.read_csv('/content/gdrive/My Drive/Housing-Price-Prediction/data/test.csv')

Note: Please see the how the directory is used.

Other Methods

Here are some other methods that can be used:

  • Uploading dataset on GitHub and then cloning it into Colab Notebook
  • Using “wget” command to directly get the dataset.

--

--

Amit Kumar

DevOps, Mule, ELK, Python, Django, Gaming, Music, Soccer, Books