TODAY’s AGENDA: “AWS S3 Batch Operations”

Kübra Kuş
TurkNet Technology
Published in
5 min readJun 3, 2024

Originally published at http://kubikolog.com on June 3, 2024.
You can visit the original for the Turkish version.

Hello Everyone,

In this blog post, we will be discussing another valuable member of AWS, AWS S3 Batch Operations. After examining it theoretically, I plan to touch on how it facilitates our work in real-life with a demo.

What is Amazon S3 Batch Operations?

Amazon S3 Batch Operations is a service used to perform batch operations on S3 objects. With this service, you can perform operations on a large number of objects simultaneously. S3 Batch Operations include object copying, object tagging, ACL updates, running Lambda functions, and more.

Steps to Use S3 Batch Operations

1.Create a Job

To start using S3 Batch Operations, the first step is to create a job. This job will perform a specific operation on a set of specified objects. The create job process includes the following steps:

  • Create Manifest File: You need to create a manifest file that lists the objects you want to operate on. This file is usually in CSV format and contains the keys of the objects to be processed.
  • Job Definition: You need to define the job using AWS Management Console, AWS CLI, or SDK. This definition includes the job’s manifest file, the type of operation, and other necessary parameters

2. Job Configuration

When creating a job, you need to select the type of operation you want to perform. The types of operations supported by S3 Batch Operations are:

  • Copy: This operation copies each object to the specified destination. It is an effective way to move your data to a different location or another folder for backup.
  • Invoke AWS Lambda function: AWS Lambda is a serverless computing service, and with this operation, you can run a specific Lambda function on each object. For example, it can be used for data transformation or analysis.
  • Replace all object tags: This operation replaces the existing tags on each object with new ones. Tags are useful for data management and organization.
  • Delete all object tags: This operation deletes all tags on objects. It is used when tags need to be cleared.
  • Replace access control list (ACL): This operation changes the access control lists of objects. It is used to determine who can access the objects.
  • Restore: This initiates restore requests for archived objects. It is used to restore objects stored in archival solutions like Amazon S3 Glacier.
  • Object Lock retention: This prevents objects from being deleted or overwritten for a specified period. It is used to ensure data integrity and security.
  • Object Lock legal hold: This prevents objects from being deleted or overwritten for legal reasons. It is critical for legal compliance and regulatory requirements.
  • Replicate: This operation performs replication of objects. It copies objects to the targets specified in the replication configuration. The manifest file must include version IDs for replication operations.

3. Start and Monitor the Job

After starting the job, S3 Batch Operations initiates the process and performs the specified operation on all objects. You can monitor the job’s progress via AWS Management Console, AWS CLI, or SDK.

4. Completion and Reporting

Once the job is completed, S3 Batch Operations generates a report detailing the results and any potential errors. This report provides detailed information on the successful and failed parts of the job.

Use Cases for Amazon S3 Batch Operations

  • Data Transfer and Backup: Using the “Copy” operation, you can transfer large datasets to a different location or another folder for backup purposes.
  • Data Transformation and Processing: Using the “Invoke AWS Lambda function” operation, you can perform custom data transformations or analysis on objects.
  • Data Management: Using the “Replace all object tags” and “Delete all object tags” operations, you can update or clear the tags on your objects.
  • Security and Access Control: Using the “Replace access control list (ACL)” operation, you can regulate and control access to objects.
  • Archiving and Restoration: Using the “Restore” operation, you can restore objects stored in archival solutions like Amazon S3 Glacier.
  • Data Protection: Using the “Object Lock retention” and “Object Lock legal hold” operations, you can prevent objects from being deleted or overwritten, ensuring data protection.

Summarizing the advantages provided by S3 Batch Operations:

It offers easy management by allowing operations on millions of objects with a single job definition. It also provides scalability with the ability to perform fast and efficient operations on large data sets. By supporting various operations such as object copying, tagging, and ACL updating, it offers a wide range of actions to users. Additionally, it provides valuable feedback to users by detailed reporting of job results and potential errors.

Demo: File Backup with AWS S3 Batch Operations

In this demo, we will showcase the usage of AWS S3 Batch Operations with a simple file backup scenario. Before starting the demo, you need an AWS account with the necessary IAM permissions (AWSBatchFullAccess and AmazonS3FullAccess roles should be assigned).

Steps:

1. Create Batch Operation Role in AWS Console:

  • Log in to the AWS Management Console.
  • Go to the IAM service and click on “Roles.”
  • Click the “Create Role” button.
  • Click the “Select AWS services” button and select the S3 service by typing “S3.”
  • Add the appropriate policy to allow S3 Batch operations and create the role.

2. Define Batch Operation:

  • In the AWS Console, go to the S3 service and click on “Batch Operations.”
  • Click the “Create Job” button.
  • Define a name and description.
  • Select the type of operation (e.g., “Copy”).
  • Specify the source and destination locations (e.g., source S3 bucket and target S3 bucket).

3. Review

  • On the Batch Operations page, select the created operation, specify the bucket address where the completion report will be saved, and click the “Submit” button.

4. Monitoring:

  • On the Batch Operations page, find the submitted job. Get information about its status and progress. When completed, you can find the report file in the path you selected while creating the job. You will see a results file like the one below.

Now that we have completed this small demo, it is time to conclude. Today, KübikFM presents the favorite song of the 2024 Eurovision. Of course, a special thanks to the cameraman who worked hard on this clip.

Stay well, folks 🐸

You can visit the original for the Turkish version.
Originally published at
http://kubikolog.com on June 3, 2024.

--

--