Minimize Downtime of Cloud Storage when Migrating between Buckets via Transfer Service

Published in

Google Cloud - Community

4 min readDec 26, 2022

Transfer Service(formally known as Storage Transfer Service or STS) is one of the GCP’s offering that allows transferring of the data to Cloud Storage from On-Prem, or from other cloud providers or POSIX file systems.

Transferring data from an active Cloud Storage bucket which is TB/PB in size, can take number of days and results in a large downtime of the application/s reading or writing data on the bucket. If writes on the bucket are not blocked when transferring data, Transfer Service will not consider any data for transfer which is added after the job is started.

In this post, we will look at the approach for transferring the data between Cloud Storage buckets via Transfer Service with reduced GCS bucket downtime without complicating the transfer process. Let’s get started !

Prerequisite

Source Cloud Storage bucket with required IAM permissions.
Destination Cloud Storage bucket with required IAM permissions.

Setup Transfer Service

Step 1: Seed Transfer

In this step, we will create a Transfer Service job for transferring the data from the source bucket to the destination bucket. For the duration of this transfer we will not block any read/write to the original source bucket.

Any data added to the source bucket after the Transfer job is started will not be considered for transfer.

New files added to the bucket in between transfer

As can be seen above, 2 new files added to the source bucket are not considered for transfer, as they are created after the Transfer Service job is already started.

Once the job is completed, all the data available in the source bucket created prior to job start time are transferred to the destination bucket.

Step 2: Sync Transfer

In this step, we will perform an incremental load of the data that changed/added after the seed transfer job already started. Storage Transfer Service transfer are incremental by default. As most of the data is already transferred as part of Step 1(Seed Transfer), this transfer will take less time depending on the amount of data added.

For the duration of this transfer, we will block all the writes to the source bucket. Read operation can also be blocked depending on the requirement.

Listing objects can be a bottleneck here, resulting in some extra time involved if there are many files.

Step 3: Update upstream and downstream applications

Update all the applications that read/write on the source bucket to point to the new bucket(In this case destination bucket used in transfer job).

Factors Impacting Transfer Speed

If there are many small objects(KiB in size), then transfer job is QPS bound i.e. a maximum of 1000 tasks(objects) are transferred per second per transfer job, which can increase the transfer time.
If the object size is too large, then bandwidth can be a bottleneck.
Bandwidth limits are set at the regional level and are fairly allocated across the projects.
For GCS-to-GCS transfers, if the location, storage class and the encryption keys are same for source and target, then no data is rewritten but only the metadata. Thus it will be very fast and only QPS bound.

Conclusion

In this article, we discussed how to minimize the downtime of the Cloud Storage buckets by keeping data intact and in sync during transfer via Transfer Service. We saw that the writes on Cloud Storage bucket will only be blocked for Sync Transfer(Step 2). Later we also talked about the factors that impact the transfer speed and should be considered beforehand.

References

Hope you enjoyed this article and found it useful. You can reach me at LinkedIn.