S3 Migration from AWS to Huawei with Batch Operations

Kübra Kuş
TurkNet Technology
Published in
8 min readJun 3, 2024

Originally published at http://kubikolog.com on June 3, 2024.
You can visit the original for the Turkish version.

Hello Everyone,

Today, I want to talk about the migration process we performed at TurkNet from AWS S3 to Huawei OBS.

Businesses sometimes need such transformations to take advantage of different features, optimize costs, or meet regulatory requirements. We also want to benefit from both platforms by moving some of our data from AWS S3 to Huawei Object Storage Service (OBS). If you also need to move large amounts of data from AWS S3 to another environment, you can see how you can automate this process faster with this post.

First, I want to briefly talk about AWS S3 and Huawei OBS.

AWS S3

Amazon Simple Storage Service (S3) is a widely used object storage service that offers industry-leading scalability, data availability, security, and performance. AWS S3 is designed to store and retrieve any amount of data from anywhere on the web. For archived data, AWS offers low-cost storage solutions such as S3 Glacier and S3 Glacier Deep Archive.

Huawei OBS

Huawei Object Storage Service (OBS) is a cloud storage service that provides secure, reliable, and cost-effective storage solutions. OBS supports large-scale data storage, integrates seamlessly with other Huawei Cloud services, and offers robust data protection features. Huawei’s opening of its Turkey zone this year has increased the importance of Huawei’s position in the Turkey zone.

S3 Migration from AWS to Huawei

Now, let’s start talking about the migration process. We need to start by checking the storage class of the objects to be moved from S3. If you have objects in the S3 Glacier or S3 Glacier Deep Archive storage class, you cannot access the data directly. You first need to perform a restore operation to make the objects accessible. For more detailed information about Amazon S3 and its storage classes, you can read my post TODAY’s AGENDA: “Amazon S3”.

Restore with Batch Operations

👉🏻 Before moving an object in the S3 Glacier Deep Archive storage class, you need to restore the data. This process makes the data more accessible. The restore operation can take anywhere from hours to days. For S3 Glacier Deep Archive, this period usually extends from 12 hours to 48 hours.

👉🏻 The restore operation creates a temporary copy of the object in a more accessible storage class, such as S3 Standard or S3 Standard-IA. This temporary copy is accessible for a certain period (usually between 1–15 days). You can determine this period according to your needs, of course at an additional cost 🙂

👉🏻 Once the temporary copy is accessible, you can change the storage class of this object using the S3 management console or AWS CLI. This process will permanently move the object to another S3 storage class of your choice, thus overcoming the temporary accessibility issue mentioned above.

❗️ As an additional note, a data retrieval fee is applied for each restore operation performed. Furthermore, once the data is accessible, copying it to a new storage class also incurs an additional cost, and all these API requests are also listed as API request costs.

Now, let’s get to the good stuff 🙂 After performing these steps, our object is now ready to be moved to any cloud provider we want. But, of course, your bucket will not just have 1–2 objects. For buckets with millions of objects, you will need to perform these restore & copy steps for each object. This is where the S3 Batch Operations feature, which I have mentioned in a previous post, comes in handy. For more detailed information, you can also read TODAY’s AGENDA: “AWS S3 Batch Operations” . With this feature, we can create a job to perform the restore operation, or even the copy operation if desired.

Now, let’s step by step perform these mentioned operations.

Restore Operation:

If you want to perform this operation manually, object by object, you can go to the Bucket section in the Amazon S3 service and specify the duration of the restore as shown below👇🏻

Here, we encounter two types of retrieval: Bulk retrieval and Standard retrieval. Bulk retrieval is the lowest cost option but has a longer retrieval time compared to standard. When we want to restore a large archive or backup and immediate access to this data is not required, we can proceed by selecting Bulk Retrieval. This option seems suitable for us as well. Once this process is complete, we will need to copy the data before the restored copy expires.

💥💥 Now let’s perform this operation for many objects at once using S3 Batch Operations 💥💥

First, go to the S3 Batch Operations service from the console. Here, we need to proceed by creating a job. In the window that opens, we need to save and upload all the objects to be processed in CSV format.

If our list of objects is extensive, the process can get complicated. We have a lot of archive data separated by folders to be moved, so we used a manifest generator to handle this process. You can easily create the manifest.csv file by downloading this small software from the GitHub link. You can use this generator for all your Batch operations.

git clone https://github.com/kubrakus/amazon-s3-manifest-generator-for-batch-operations

cd amazon-s3-manifest-generator-for-batch-operations
npm install

After the Manifest Generator is installed, you can create a manifest.csv file for your bucket as shown in the example. It is also possible to generate this manifest content based on folders.

Note: To access AWS and view your buckets, you need to download AWS CLI.

Now, let’s upload our manifest file to the bucket.

aws s3 cp manifest.csv s3://kubikolog-demo/

We have now completed the requirements to create a Batch Operation job. Click the browse S3 button and select our manifest.csv file as shown in the image below.

In the next step, we choose which operation we want to perform and continue by selecting restore. We proceed by specifying the retention period and restore type of the restored copy, as we defined in the manual restore process.

In the next step, we define the address where the report will be generated when the process is completed and select the relevant IAM role to create the job.

Note: Here, as you can see, the “BatchOperationsRestore” role is selected. If this role is not assigned for this operation, you need to add it from the IAM panel first. You can proceed by granting permission to all buckets or selected buckets.

Yes, our job is now ready. ✅

One last step remains to run the job. Click the job and then click run, and our restore operation will start.

Now, when you look at the objects, you will see the “restore in progress” statement in the description section. Once completed, the expiration date will be displayed.

This process will take some time depending on the size of your data. Once completed, you can check the completion from both the job screen and the object.

Then we can proceed to the copy operation.

Copy Operation:

With the following command, we will copy all files and subfolders in the demo1 folder of thekubikolog-demo bucket to the demo1 folder in the kubikolog-demo-std bucket. All copied files will be stored in the STANDARD storage class in the target bucket.

aws s3 cp s3://kubikolog-demo/demo1/ s3://kubikolog-demo-std/demo1 --storage-class STANDARD --recursive --force-glacier-transfer

After this entire process, we have now saved our data in the standard storage class type for the migration process.

Now, we can perform the migration operation. 🧬

Migration Operation:

After logging into the Huawei Cloud console, open the Object Storage Service. There is a feature created for the migration process here.

❗️ For the migration process, access key and secret information are required for both Amazon S3 and Huawei OBS. By entering this information and selecting the source and destination buckets, we can proceed with the migration process.

In the next screen, there is a tab where various parameters can be customized. In the source configuration, you can select the transfer method (file/folder, object list, object name prefix, or URL list), the source bucket, and the objects to be transferred. Additionally, there are options to transfer object metadata and perform selective transfer. In the target configuration, you can set data encryption, a specific prefix, and storage class settings.

Once the migration task is started, you can check its status from the tasks tab. When completed, you will see that the task is marked as completed below.

By the way, there are two options: Migration Tasks and Migration Task Groups. “Migration Tasks” represent individual data transfer operations, while “Migration Task Groups” are groups of these tasks organized together for management purposes. Individual migration tasks facilitate the transfer of specific data sets or objects from the source to the destination, while task groups bring similar tasks together to make the migration process more systematic and efficient. This structure makes large-scale data transfer operations more manageable and organized.

We have successfully completed the data migration process between two cloud providers, friends ✅

Now I leave you with a song from the KubikFM playlist to nap on the couch while listening, sweet dreams 🐸

You can visit the original for the Turkish version.
Originally published at
http://kubikolog.com on June 3, 2024.

--

--