Useful methods for Boto3 in S3
Introduction:
Boto3 is the name of the Python SDK for AWS. One of the core components of AWS is Amazon Simple Storage Service (Amazon S3), the object storage service offered by AWS. Using the Boto3 library with Amazon Simple Storage Service (S3) allows you to create, update, and delete S3 Buckets, Objects, S3 Bucket Policies, and many more from Python programs or scripts.
Here we will see how we can use the Boto3 module to perform various actions in S3.
For more details on the Boto3 module check out my earlier articles -> ->.
Installation and Configuration:
To use boto3 in S3 we need the following:
- AWS Account Credentials (Access key, Secret key)
- IAM User with full access to S3
- AWS CLI
- python3
- boto3
To install AWS CLI, run the following command in your terminal:
pip install awscli
Similarly, to install Boto3, run the following command in your terminal:
pip install boto3
To configure the AWS environment, type the following command in your terminal:
aws configure
This command will prompt you to enter information to form a connection with your AWS account. For the Access key and Secret key, enter your AWS Access Key and AWS Secret Access Key of the IAM User with the required permissions. For the Default region name, enter the server region in which the bucket you want to access is. If you haven’t created a bucket or it is in a global region, use “us-east-1”. For the Default output format enter “json”
Alternatively, you can also pass this information as parameters to client()
import boto3client = boto3.client(‘s3’,aws_access_key_id=”XXXXXXX”,aws_secret_access_key=”YYYYYYY”,region_name=”us-east-1")
*NOTE: Storing your AWS credentials in your scripts is not secure and, you should never do this, we can set them as environment variables or use `.env` file and load it into the Python script but even storing AWS Access and Secret Keys in a plain text file is not very secure. The better and more secure way is to store AWS Access and Secret Keys in the encrypted store, for example, aws-vault.
*** Important Fact *** :
S3 is object storage, it doesn’t have an actual directory structure. The “/” is rather cosmetic. So the names of files retrieved will include the full path too, and we can use that to identify the current location of that file.
To list all files inside the current folder only :
import boto3files_list = []s3 = boto3.client(‘s3’)object_list = s3.list_objects_v2(Bucket=’bucket_name’,Delimiter = ‘/’,Prefix=’folder/sub-folder/’)#Do not use the prefix parameter if you are in the root directoryfor item in object_list[‘Contents’]:files = item[‘Key’]file_name = files.replace(prefix,””)#remove this check if you also want files in sub-foldersif ‘/’ not in file_name: if ‘.’ in file_name and ‘/’ in files: files_list.append(files)print files_list
To list all the only folders in the current directory:
import boto3s3 = boto3.client(‘s3’)response = s3.list_objects_v2(Bucket = ‘bucket_name’,Delimiter = ‘/’,Prefix = ‘folder/sub-folder/’)#Do not use the prefix parameter if you are in the root directoryfolder_list = []for prefix in response[‘CommonPrefixes’]: folder_name = prefix[‘Prefix’][:-1] if folder_name != currentdir: folder_list.append(folder_name)print folder_list
To Download a specific file:
**NOTE: Make sure the path to the folder or create the folder specified in output beforehand to avoid errors
"""output = location where to download the filefile = full file name with path prefix — the name of the file in the S3 bucket``file_name = filename only — the name of the downloaded file to be savedbucket = name of the S3 bucket"""s3 = boto3.resource(‘s3’)file_name = str(file.rsplit(‘/’, 1)[-1])output = f”downloads/{file_name}”s3.Bucket(bucket).download_file(file, output)
To list all files of a specific type in the current folder and its sub-folders:
files_list = []s3_resource = boto3.resource(‘s3’)bucket = s3_resource.Bucket(‘bucket_name’)if currentdir == ‘/’: #if current directory is root objects = bucket.objects.all()else: objects = bucket.objects.filter(Prefix = currentdir)ext_objects = [{‘Key’: o.key} for o in objects if o.key.endswith(file_extn)]for item in ext_objects: files_list.append(item[‘Key’])print files_list
Conclusion:
We can also upload and delete files similarly, and do much more. Most of the required methods are either already in the boto3 module (like implementing pagination, etc. ) or easy to define with the help of the existing methods. So be sure to give it a try and!CODE-HAPPY!