Control AWS S3 using Boto3

Introduction :

SivaraamTK
featurepreneur
10 min readJul 24, 2022

--

Boto3 is the name of the Python SDK for AWS. We can do the same things that we do in AWS Console and even more, but in a faster, repeated, and automated way. It allows you to directly create, update, and delete AWS resources from your Python scripts. I have written a detailed article about the Boto3 module and more ways to use Boto3, be sure to check it out before reading this.

One of the core components of AWS is Amazon Simple Storage Service (Amazon S3), the object storage service offered by AWS. With its impressive availability and durability, it has become the standard way to store videos, images, and data, commonly used for data analytics applications, machine learning, websites, and many more You can combine S3 with other services to build infinitely scalable applications.

Using the Boto3 library with Amazon Simple Storage Service (S3) allows you to create, update, and delete S3 Buckets, Objects, S3 Bucket Policies, and many more from Python programs or scripts with ease.

Prerequisites :

  • AWS Account Credentials (Access key, Secret key)
  • IAM User with full access to S3
  • Install AWS CLI and configure
  • python3
  • Install boto3

Installation :

To install AWS CLI, run the following command in your terminal:

pip install awscli

Similarly, to install Boto3, run the following command in your terminal:

pip install boto3

Configuration :

To configure the AWS environment, type the following command in your terminal:

aws configure

This command will prompt you to enter information to form a connection with your AWS account. For the Access key and Secret key, enter your AWS Access Key and AWS Secret Access Key of the IAM User with the required permissions. For the Default region name, enter the server region in which the bucket you want to access is. If you haven’t created a bucket or it is in global region, use “us-east-1”. For the Default output format enter “json”

Alternatively, you can also pass this information as parameters to the client()

import boto3client = boto3.client(‘s3’,aws_access_key_id=”XXXXXXX”,aws_secret_access_key=”YYYYYYY”,region_name=”us-east-1")

**NOTE: Storing your AWS credentials in your scripts is not secure and, you should never do this, we can set them as environment variables or use the `.env` file and load it into the Python script but even storing AWS Access and Secret Keys in a plain text file is not very secure. The better and more secure way is to store AWS Access and Secret Keys in the encrypted store, for example, aws-vault.

Create an S3 bucket using Boto3 :

To create the Amazon S3 Bucket using the Boto3 library, you need to either create_bucket client or create_bucket resource.

**NOTE: Every Amazon S3 Bucket must have a unique name. Moreover, this name must be unique across all AWS accounts and customers.

Creating S3 Bucket using Boto3 client:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”client = boto3.client(“s3”, region_name=AWS_REGION)bucket_name = “demo-bucket”location = {‘LocationConstraint’: AWS_REGION}response = client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration=location)print(“Amazon S3 bucket has been created”)

**NOTE: To avoid various exceptions while working with the Amazon S3 service, we strongly recommend you define a specific AWS Region for the Boto3 client and S3 Bucket Configuration

Similarly, you can use the Boto3 resource to create an Amazon S3 bucket:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”resource = boto3.resource(“s3”, region_name=AWS_REGION)bucket_name = “demo-bucket”location = {‘LocationConstraint’: AWS_REGION}bucket = resource.create_bucket(Bucket=bucket_name,CreateBucketConfiguration=location)print(“Amazon S3 bucket has been created”)

Listing Amazon S3 Buckets using Boto3 :

There are two ways of listing Amazon S3 Buckets:

Here’s an example of listing existing S3 Buckets using the S3 client:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”client = boto3.client(“s3”, region_name=AWS_REGION)response = client.list_buckets()print(“Listing Amazon S3 Buckets:”)for bucket in response[‘Buckets’]: print(f” — {bucket[‘Name’]}”)

Here’s an example of listing existing S3 Buckets using the S3 resource:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”resource = boto3.resource(“s3”, region_name=AWS_REGION)iterator = resource.buckets.all()print(“Listing Amazon S3 Buckets:”)for bucket in iterator: print(f” — {bucket.name}”)

Deleting Amazon S3 Bucket using Boto3 :

There are two possible ways of deletingAmazon S3 Bucket using the Boto3 library:

Here’s an example of deleting the Amazon S3 bucket using the Boto3 client:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”client = boto3.client(“s3”, region_name=AWS_REGION)bucket_name = “demo-bucket”client.delete_bucket(Bucket=bucket_name)print(“Amazon S3 Bucket has been deleted”)

Here’s an example of deleting the Amazon S3 bucket using the Boto3 resource:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”resource = boto3.resource(“s3”, region_name=AWS_REGION)bucket_name = “demo-bucket”s3_bucket = resource.Bucket(bucket_name)s3_bucket.delete()print(“Amazon S3 Bucket has been deleted”)

Deleting non-empty S3 Bucket using Boto3 :

To delete an S3 Bucket using the Boto3 library, you have to clean up the S3 Bucket. Otherwise, the Boto3 library will raise the BucketNotEmpty exception. The cleanup operation requires deleting all S3 Bucket objects and their versions:

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_bucket = s3_resource.Bucket(S3_BUCKET_NAME)def cleanup_s3_bucket():# Deleting objects for s3_object in s3_bucket.objects.all():  s3_object.delete()# Deleting objects versions if S3 versioning enabled for s3_object_ver in s3_bucket.object_versions.all():  s3_object_ver.delete() print(“S3 Bucket cleaned up”)cleanup_s3_bucket()s3_bucket.delete()print(“S3 Bucket deleted”)

Uploading a file to S3 Bucket using Boto3 :

The Boto3 library has two ways for uploading files and objects into an S3 Bucket:

The upload_file() method requires the following arguments:

  • file_name — filename on the local filesystem
  • bucket_name — the name of the S3 bucket
  • object_name — the name of the uploaded file (usually equals to the file_name)

Here’s an example of uploading a file to an S3 Bucket:

#!/usr/bin/env python3import pathlibimport boto3BASE_DIR = pathlib.Path(__file__).parent.resolve()AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)def upload_files(file_name, bucket, object_name=None, args=None): if object_name is None:  object_name = file_name  s3_client.upload_file(file_name, bucket, object_name, ExtraArgs=args)  print(f”’{file_name}’ has been uploaded to ‘{S3_BUCKET_NAME}’”)upload_files(f”{BASE_DIR}/files/demo.txt”, S3_BUCKET_NAME)

We’re using the pathlib module to get the script location path and save it to the BASE_DIR variable. Then, we’re creating the upload_files() method that is responsible for calling the S3 client and uploading the file.

Uploading multiple files to the S3 bucket :

To upload multiple files to the Amazon S3 bucket, you can use the glob() method from the glob module. This method returns all file paths that match a given pattern as a Python list. You can use glob to select certain files by a search pattern by using a wildcard character:

#!/usr/bin/env python3import osimport pathlibfrom glob import globimport boto3BASE_DIR = pathlib.Path(__file__).parent.resolve()AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”S3_CLIENT = boto3.client(“s3”, region_name=AWS_REGION)def upload_file(file_name, bucket, object_name=None, args=None): if object_name is None:  object_name = file_name  S3_CLIENT.upload_file(file_name, bucket, object_name, ExtraArgs=args)  print(f”’{file_name}’ has been uploaded to ‘{S3_BUCKET_NAME}’”)files = glob(f”{BASE_DIR}/files/*.txt”)for file in files: upload_file(file, S3_BUCKET_NAME)

Uploading generated file object data to S3 Bucket using Boto3 :

If you need to upload file object data to the Amazon S3 Bucket, you can use the upload_fileobj() method. This method might be useful when you need to generate file content in memory (example) and then upload it to S3 without saving it on the file system.

**NOTE: the upload_fileobj() method requires opening a file in binary mode.

Here’s an example of uploading a generated file to the S3 Bucket:

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)def upload_generated_file_object(bucket, object_name): with io.BytesIO() as f:  f.write(b’First line.\n’)  f.write(b’Second line.\n’)  f.seek(0)s3_client.upload_fileobj(f, bucket, object_name)print(f”Generated has been uploaded to ‘{bucket}’”)upload_generated_file_object(S3_BUCKET_NAME, ‘generated_file.txt’)

Enabling S3 Server-Side Encryption (SSE-S3) for uploaded objects :

You can use S3 Server-Side Encryption (SSE-S3) encryption to protect your data in Amazon S3. We will use server-side encryption, which uses the AES-256 algorithm:

#!/usr/bin/env python3import pathlibimport boto3BASE_DIR = pathlib.Path(__file__).parent.resolve()AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)def upload_files(file_name, bucket, object_name=None, args=None): if object_name is None:  object_name = file_name s3_client.upload_file(file_name, bucket, object_name, ExtraArgs=args) print(f”’{file_name}’ has been uploaded to ‘{S3_BUCKET_NAME}’”)upload_files(f”{BASE_DIR}/files/demo.txt”,S3_BUCKET_NAME,‘demo.txt’,args={‘ServerSideEncryption’: ‘AES256’})

Getting a list of files from S3 Bucket:

The most convenient method to get a list of files from S3 Bucket using Boto3 is to use the S3Bucket.objects.all() method:

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_bucket = s3_resource.Bucket(S3_BUCKET_NAME)print(‘Listing Amazon S3 Bucket objects/files:’)for obj in s3_bucket.objects.all(): print(f’ — {obj.key}’)

Otherwise, we can use tlist_objects_v2()

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)s3_bucket = s3_client.list_objects_v2(Bucket=S3_BUCKET_NAME)print(‘Listing Amazon S3 Bucket objects/files:’)for obj in s3_bucket.objects.all(): print(f’ — {obj.key}’)

Filtering results of S3 list operation using Boto3 :

If you need to get a list of S3 objects whose keys are starting from the specific prefix, you can use the .filter() method to do this:

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_bucket = s3_resource.Bucket(S3_BUCKET_NAME)print(‘Listing Amazon S3 Bucket objects/files:’)for obj in s3_bucket.objects.filter(Prefix=’demo’): print(f’ — {obj.key}’)

Downloading file object from S3 Bucket :

You can use the download_file() method to download the S3 object to your local file system:

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_object = s3_resource.Object(S3_BUCKET_NAME, ‘demo.txt’)s3_object.download_file(‘/tmp/demo.txt’)print(‘S3 object download complete’)

Reading files from the S3 bucket into memory :

#!/usr/bin/env python3import ioimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_object = s3_resource.Object(S3_BUCKET_NAME, ‘demo.txt’)with io.BytesIO() as f:s3_object.download_fileobj(f)f.seek(0)print(f’Downloaded content:\n{f.read()}’)

Deleting S3 objects using Boto3 :

To delete an object from Amazon S3 Bucket, you need to call the delete() method of the object instance representing that object:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)s3_object = s3_resource.Object(bucket_name, ‘new_demo.txt’)s3_object.delete()print(‘S3 object deleted’)

Renaming S3 file object using Boto3:

There’s no single API call to rename an S3 object. So, to rename an S3 object, you need to copy it to a new object with a new name and then deleted the old object:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)def rename_s3_object(bucket_name, old_name, new_name): old_s3_object = s3_resource.Object(bucket_name, old_name) new_s3_object = s3_resource.Object(bucket_name, new_name) new_s3_object.copy_from( CopySource=f’{bucket_name}/{old_name}’) old_s3_object.delete() print(f’{bucket_name}/{old_name} -> {bucket_name}/{new_name}’)rename_s3_object(S3_BUCKET_NAME, ‘demo.txt’, ‘new_demo.txt’)

Copying file objects between S3 buckets using Boto3:

To copy file objects between S3 buckets using Boto3, you can use the copy_from() method.

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”BUCKET_NAME = “demo-bucket”def copy_object(bucket, src_object, dst_object): s3_resource = boto3.resource(‘s3’, region_name=AWS_REGION) s3_resource.Object(bucket, dst_object).copy_from( CopySource=f’{bucket}/{src_object}’)copy_object(bucket=BUCKET_NAME, src_object=’demo1.txt’, dst_object=’demo2.txt’)

Creating S3 Bucket Policy using Boto3 :

To specify requirements, conditions, or restrictions for accessing the Amazon S3 Bucket, you have to use Amazon S3 Bucket Policies

Let’s use the Boto3 library to set up this policy in the S3 bucket:

#!/usr/bin/env python3import jsonimport boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)BUCKET_POLICY = { “Version”: “2012–10–17”, “Statement”: [  {  “Principal”: {  “AWS”: “*”  },  “Action”: [  “s3:*” ], “Resource”: [  f”arn:aws:s3:::{S3_BUCKET_NAME}/*”,  f”arn:aws:s3:::{S3_BUCKET_NAME}” ], “Effect”: “Deny”, “Condition”: {  “Bool”: {   “aws:SecureTransport”: “false”  },  “NumericLessThan”: {   “s3:TlsVersion”: 1.2      }   }  } ] } policy_document = json.dumps(BUCKET_POLICY)s3_client.put_bucket_policy(Bucket=S3_BUCKET_NAME,  Policy=policy_document)print(‘Bucket Policy has been set up’)

Deleting S3 Bucket Policy using Boto3 :

To delete the S3 Bucket Policy, you can use the delete_bucket_policy() method of the S3 client:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-2”S3_BUCKET_NAME = “hands-on-cloud-demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)s3_client.delete_bucket_policy(Bucket=S3_BUCKET_NAME)print(‘Bucket Policy has been deleted’)

Generating S3 presigned URL using Boto3:

If you need to share files from a non-public Amazon S3 Bucket without granting access to AWS APIs to the final user, you can create a pre-signed URL to the Bucket Object:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_client = boto3.client(“s3”, region_name=AWS_REGION)def gen_signed_url(bucket_name, object_name): url = s3_client.generate_presigned_url(ClientMethod=’get_object’, Params={‘Bucket’: bucket_name, ‘Key’: object_name},ExpiresIn=3600) print(url)gen_signed_url(S3_BUCKET_NAME, ‘demo.txt’)

The S3 client’s generate_presigned_url() method accepts the following parameters:

  • ClientMethod (string) — The Boto3 S3 client method to presign for
  • Params (dict) — The parameters need to be passed to the ClientMethod
  • ExpiresIn (int) — The number of seconds the presigned URL is valid for. By default, the presigned URL expires in an hour (3600 seconds)
  • HttpMethod (string) — The HTTP method to use for the generated URL. By default, the HTTP method is whatever is used in the method’s model

Enabling S3 Bucket versioning using Boto3 :

S3 Bucket versioning allows you to keep track of the S3 Bucket object’s versions over time. Also, it safeguards against accidental object deletion. Boto3 will retrieve the most recent version of a versioned object on request. When a new version of an object is added, the object takes up the size of storage of the versions added together; i.e., a 2MB file with 5 versions will take up 10MB of space in the storage. To enable versioning for the S3 Bucket, you need to use the enable_version() method:

#!/usr/bin/env python3import boto3AWS_REGION = “us-east-1”S3_BUCKET_NAME = “demo-bucket”s3_resource = boto3.resource(“s3”, region_name=AWS_REGION)def enable_version(bucket_name): versioning = s3_resource.BucketVersioning(bucket_name) versioning.enable() print(f’S3 Bucket versioning: {versioning.status}’)enable_version(S3_BUCKET_NAME)

Conclusion :

One of the things I always wished I knew before working on S3 using Boto3 is that S3 is object storage, it doesn’t have a real directory structure and The “/” is rather cosmetic that is used to simulate a simple file system and hence S3 objects cannot have “/” in their name. If you wish to explore more functionalities of Boto3 for S3 check this doc. And I guess that’s all for now. !HAPPY-CODING!

--

--

SivaraamTK
featurepreneur

An aspiring developer from Chennai who’s passionate to learn new technologies and overcome all challenges to become better than the me from yesterday