Accessing S3 Bucket From Google Colab

Lily Su
2 min readJun 4, 2020

--

Photo by Francesco Luca Labianca on Unsplash

We’re using Google Colab, a hosted Jupyter notebook that allows code to be executed on the cloud. We chose this platform so one can really quickly interact with the code with limited set-up time and share with others. The first thing we like to do is to connect our google drive in case we need to save any visualization files to download later:

from google.colab import drive
drive.mount('/content/drive')

Get Data from AWS S3 Bucket with Colab:

We’re going to set up the AWS cli, which is the command line interface connecting to an S3 bucket.

!pip install awscli

To have AWS cli work in Google Colab, a configuration folder under the path “content/drive/My Drive/” called “config” needs to be created as a .ini file that contains credentials to be stored.

text = '''
[default]
aws_access_key_id = <your access key id>
aws_secret_access_key = <your secret access key>
region = <your region>
'''
path = "/content/drive/My Drive/config/awscli.ini"
with open(path, 'w') as f:
f.write(text)
!cat /content/drive/My\ Drive/config/awscli.ini

The above script only needs to be run once, since it is equivalent to saving an username and password to a file to be accessed later.

Next, we pass our credentials in:

import os!export AWS_SHARED_CREDENTIALS_FILE=/content/drive/My\ Drive/config/awscli.inipath = "/content/drive/My Drive/config/awscli.ini"
os.environ['AWS_SHARED_CREDENTIALS_FILE'] = path
print(os.environ['AWS_SHARED_CREDENTIALS_FILE'])

Now, we should be connected and we can view all files in our S3 bucket by passing in the following command:

!aws s3 ls s3://<S3 bucket name> --recursive --human-readable --summarize

If you want to extract one file by name:

!aws s3 cp s3://<S3 bucket name> ./<or path to save file> --recursive --exclude "*" --include <file name with path in quotes>

There you go! Enjoy connecting to S3!

--

--