Streamlining Data Transfers: Python’s Guide to Amazon S3 Cloud Object Storage

Published in

The Deep Hub

7 min readApr 16, 2024

Data being stored in Amazon S3 Cloud Object Storage bucket | Source: Author using Microsoft Designer and Adobe Photoshop

· Introduction
· What is Amazon S3?
· What Are the Benefits of Utilizing Amazon S3?
· How to Setup an S3 Bucket
∘ — Creating a S3 Bucket
∘ — Create Access Keys
· Configuring Python to Interface with S3
∘ — Installing Dependencies
∘ — Configure Access Keys
· Interfacing with S3
∘ — Create a Python File
∘ — Create a CSV File
∘ — Execute Python File
· Conclusion

Introduction

Though the idea of storing and processing data from afar has been kicking around for ages, modern cloud storage, as we know it today, has been going strong for about two decades. It really started picking up steam in the mid-2000s. Amazon S3 (Simple Storage Service) officially dropped on March 14, 2006. It was one of the first offerings from Amazon Web Services (AWS), setting the stage for Amazon’s big move into the cloud computing scene. S3 totally changed the game for data storage, hooking developers up with scalable, tough-as-nails object storage up in the cloud. Since then, Amazon S3 has blown up big time, becoming one of the go-to cloud storage services for millions of customers, from small startups to major corporations. In this article, I’ll explain what Amazon S3 is, why you might want to use it, and how you can start using it today.

Disclaimer: I am NOT sponsored by Amazon… but would love to be :)

What is Amazon S3?

Amazon S3 (Simple Storage Service), also known as S3 bucket, is a cloud-based object storage service designed for storing and retrieving any amount of data from anywhere on the web. It accepts any file type as an object, including documents, images, and videos. It offers virtually limitless storage capacity and supports an infinite number of objects, making it suitable for a wide range of use cases, from simple backups to complex data analytics, but why would someone want to use Amazon S3?

What Are the Benefits of Utilizing Amazon S3?

Numerous advantages accompany the utilization of S3 as a storage solution. However, I will focus on delineating five key benefits:

Scalability: As mentioned in the previous section, S3 provides virtually limitless storage capacity, allowing you to scale your storage needs seamlessly as your data grows.
Durability and Availability: S3 ensures high durability and availability by storing data across multiple locations and automatically replicating it within a chosen region, providing robust data protection and reliability.
Security: S3 implements robust security features, including encryption at rest and in transit, access controls, and audit logging, ensuring data confidentiality and integrity.
Cost-effectiveness: With its pay-as-you-go pricing model and various storage classes, S3 offers cost-effective storage solutions for a wide range of use cases..
Versioning: S3 supports object versioning, allowing you to keep multiple versions of an object and restore previous versions as needed, providing data protection against accidental deletions or modifications.

The aforementioned factors influenced my decision to opt for S3 as the storage solution for one of my recent projects. This endeavor entailed extracting stock data from two financial APIs and securely storing it in S3 using Python for subsequent processing:

Engineering an Advanced ELT Pipeline for Optimizing Stock Portfolios

Utilizing Mean-Variance Optimization (MVO) and Markowitz’s Efficient Frontier

medium.com

How to Setup an S3 Bucket

To get started, you will first need to create an AWS account. Once that is completed, you will need to create an S3 bucket.

Creating a S3 Bucket

From the AWS console homepage, click on View all services > S3 > Create bucket. To upload your data to Amazon S3, you must first create an Amazon S3 bucket in one of the AWS Regions. The region can be changed from the top-right corner of the console. You can then select from several options based on the type of bucket you want to create.

TIP: Choose a region that is closest to your geographical location.

For this example, I chose the following S3 bucket options:

AWS Region: US East (N. Virginia) us-east-1
Bucket Type: General Purpose
Bucket Name: medium-example-bucket (choose your own name)
Object Ownership: ACLs disabled (recommended)
Public Access Settings: Block all public access
Bucket Versioning: Enabled
Encryption Type: Server-side encryption with Amazon S3 managed keys (SSE-S3)
Object Lock: Disable

Upon successful creation, you should see the below with the bucket name that you created.

Amazon S3 bucket: **medium-example-bucket** folder | Source: Author

Create Access Keys

S3 access keys are used for authenticating and authorizing access to your Amazon S3 resources programmatically. They consist of an Access Key ID and a Secret Access Key, which are used together to sign requests to AWS services, including S3, allowing you to perform operations such as uploading, downloading, and deleting objects within your S3 buckets. To create your access keys, you will need to click on your profile in the top-right-corner of the AWS console and select Security credentials > Create access key. Once created, you may write down your access keys or download as an CSV file.

WARNING: DO NOT share your access keys with anyone unless you want to provide them access to your AWS account. For more details about managing access keys, see the best practices for managing AWS access keys.

Configuring Python to Interface with S3

Ensure you have the latest version of Python installed for your Operating System (OS):

Once the installation is completed, you will need to open Terminal (macOS) or Command Prompt (Windows OS) to execute the below commands.

Installing Dependencies

Enter the below command to make a directory called computer-to-s3 and navigate to it.

mkdir computer-to-s3 && cd computer-to-s3   # Make directory and navigate to it

Create a Python virtual environment, activate it, and update pip:

python -m venv .venv.       # Create virtual environment
source .venv/bin/activate   # Activate virtual environment
pip install --upgrade pip   # Update pip

Install Python packages:

pip install awscli   # Install the AWS Command Line Interface (CLI)
pip install boto3    # Install Boto3
pip install pandas   # Install Pandas

The AWS Command Line Interface (CLI) allows you to securely store your access keys in a hidden folder located in your home directory (~/.aws/credentials). Boto3 enable Python-based interaction with S3. In this example, I will be using CSV files and Pandas allows for me to read that file type.

Configure Access Keys

Run the below command to configure the AWS Access Key ID and your AWS Secret Access Key:

aws configure   # Configure access keys

You will be prompted to enter the below:

AWS Access Key ID
AWS Secret Access Key
Default Region Name (optional)
Default Output Format (optional)

Interfacing with S3

Create a Python File

In the computer-to-s3 directory, create a Python file with below code and name it s3_interface.py:

import boto3
import pandas as pd
from io import StringIO


def push_to_s3(content, bucket, key):
    # Initialize the S3 client
    s3 = boto3.client('s3')
    
    # Upload the file to AWS S3 bucket
    try:
        s3.put_object(Body=content, Bucket=bucket, Key=key)
        print(f'\nUpload successful: Successfully pushed data to {bucket}.\n'
              f'File saved as: {key}.\n')
    except Exception as e:
        print(f'Upload failed: {e}')
        
      
def pull_from_s3(bucket, key):
    # Initialize the S3 client
    s3 = boto3.client('s3')
    
    try:
        print(f'\nRetrieving {key} from {bucket}.\n')
        # Retrieve the object containing the CSV data    
        response = s3.get_object(Bucket=bucket, Key=key)
        
        # Read the CSV content from the response
        csv_content = response['Body'].read().decode('utf-8')
        
        # Use StringIO to convert the string to a file-like object for pandas
        csv_file = StringIO(csv_content)
        
        # Read CSV into a pandas DataFrame
        df = pd.read_csv(csv_file)
        print(df)
    except KeyError:
        df = []

    return df


if __name__ == '__main__':
    
    bucket = input('Bucket Name: ')      # S3 bucket name
    filename = input('CSV Filename: ')   # CSV filename
    
    # Type 'r' to read from S3 or 'w' to write to S3
    method = input('Read (r) or Write (w): ').lower()   
    
    if method == 'w':
        df = pd.read_csv(filename)   # Read CSV file
         
        # Convert DataFrame to CSV string
        csv_buffer = StringIO()
        df.to_csv(csv_buffer, index=False)
        csv_content = csv_buffer.getvalue()
       
        push_to_s3(csv_content, bucket, filename)   # Save data to S3
        
    elif method == 'r':    
        pull_from_s3(bucket, filename) # Read from S3

Create a CSV File

If you already have a CSV file, then save it to thecomputer-to-s3 directory. If not, then you may download the demo file from my Google Drive and save it as demo.csv.

Execute Python File

Open the Terminal (macOS) or Command Prompt (Windows OS) and run the below command from the computer-to-s3 directory:

python s3_interface.py   # Run Python file for reading and writing to S3

You will be prompted for the below

Bucket Name
CSV Filename
Read (r) or Write (w)

Here’s an example of me writing data to S3 successfully.

Writing data to Amazon S3 using Terminal (macOS) commands | Source: Author

Amazon S3 bucket: **medium-example-bucket** contents | Source: Author

Here’s an example of me reading data from S3 successfully.

Reading data from Amazon S3 using Terminal (macOS) commands | Source: Author

Conclusion

And there you have it! At this juncture, you should have a clear understanding of what Amazon S3 entails, its potential benefits, and how to navigate its usage independently. Should you have any inquiries, please don’t hesitate to reach out, as I am more than willing to provide assistance.

Until next time, have an AWSome day!

If you ever want connect you can check me out on LinkedIn — I’m excited to engage with fellow enthusiasts in the dynamic field of Data Engineering. Thanks for your time, and I look forward to interacting with you in the comments section.