Schedule Google Cloud STS Transfer Job with Cloud Scheduler

Bambang Satrijotomo
5 min readJan 15, 2022

--

In this article we are going to discuss moving data from on-premise to the Google Cloud Storage using Storage Transfer Service with the objective to minimize the latency.

The first step in adopting Google Cloud for many customers is getting their data into Google Cloud. Customers can choose between offline (such as an appliance) or online transfers depending on the transfer factors and requirements. There are at least two options for online transfer: gsutil and Storage Transfer Service (STS). The transfer methods comparison between options available from Google in different scenarios are described in this document and also this document. STS provides options that make data transfers and synchronization easier. For example, one time or a recurring transfer, delete data source objects after successful transfer, and more.

One of the limitations when using STS is that it only allows scheduling a transfer job to run once per hour at the maximum. If the requirement dictates that the latency, defined as the lag between the time files are available in source and the time that the transfer to a Cloud Storage bucket starts, be as low as possible, a once per hour run may not satisfy.

Fortunately, we have a solution and this is what we will describe in the remainder of this article. Cloud Scheduler comes to the rescue. It is a fully managed enterprise-grade cron/scheduling platform. We will use Cloud Scheduler as an alternative to the built-in STS job scheduling. In the following diagram we see Cloud Scheduler as an additional component. At the scheduled time, Cloud Scheduler sends a POST API call to STS to trigger an execution of a transfer job. If there is no redundant job running, STS responds with 200 response code and triggers the file transfers from on-premise data source to the destination bucket in the Cloud Storage.

Prerequisites

1. You must have an existing and working STS job that will be run periodically. We need the job’s name that can be retrieved from the STS Job details Configuration page. The example in this article is based on a Transfer Services on-premises (TSOP) job.

2. Create a Service Account in Identity and Management (IAM) and assign it the “Storage Transfer User” role. We will use this service account in Cloud Scheduler configuration.

Configuration

  1. Go to Cloud Scheduler. Create a new job. Specify the schedule/frequency. If you want it to run every 10 min starting at the hour, then set “0,10,20,30,40,50 * * * *”. Select timezone then click Continue.

2. Configure the Execution part.

a. Target type: HTTP. We need to use HTTP target type here because as described in this document, the STS run API requires HTTP request with POST method.

b. URL: Use the following format:

https://storagetransfer.googleapis.com/v1/transferJobs/<STS job name>:run?alt=json

You may notice that the URL is different from the one in this document due to gRPC Transcoding syntax used in the document. Using gRPC transcoding syntax here will result in error 404.

c. HTTP method: POST. We will specify the project ID in the API call’s request body, hence we use POST method.

d. There’s no need to add HTTP Headers.

e. Add body with projectID in JSON format.

{     "projectId": "<project_ID>"}

f. Select “Add Oauth token” under Auth Header.

g. Select service account created in the prerequisite.

h. Scope will be populated automatically.

i. Click Continue.

3. You can configure optional settings or leave it as is.

4. Click Create.

5. When the list of Scheduled jobs appears, click Run Now at the right side of the job to test if the job can run as expected.

6. Verify in the STS UI that the execution completed successfully. If not, check the Cloud Scheduler logs and find the error message. For example, error 404 indicates incorrect URL.

Notes:

  1. If the Storage Transfer Service (STS) job is still running when Cloud Scheduler job triggers a new execution, the STS will respond with error 400 and the existing STS job will continue running until completion. Hence consider STS job’s Run History, not Cloud Scheduler job’s log, as the source of truth when checking if a run is successful or not.
  2. At the time of writing, there is no charge for STS except when data transfer is from on-premise which is charged at $0.0125 per GB transferred to the destination successfully.
  3. Cloud Scheduler adds negligible cost. At the time of writing, first 3 jobs per Google account are free. Subsequent jobs are charged $0.10 per job per month.

Photo by Lukas Blazek on Unsplash

--

--

Bambang Satrijotomo

I am an ex. Google Cloud Customer Engineer who loves to help our users to get the most from the platform.