Google Cloud Scheduler: serveless CRON jobs with Python
Having never experienced it myself, I only heard about the troubles of having some script running at a given frequency. Professionals may need to deploy a CRON on a machine every time they need to automate this process.
Since it’s a known issue and I have experience working with cloud services, I decided to check what Google Cloud Platform has to offer. The answer I found is Google Cloud Scheduler, a fully managed CRON job service.
In order to play around with it, I had decided to make a Twitter bot. Not that I am a Twitter enthusiast, but it is very simple to work with its Python API. I programmed the bot to post ‘Have a nice day’ every day, at 8 a.m. BRT.
In order to do so, I put the Google Cloud Scheduler to trigger a Cloud Function through Cloud Pub/Sub.
Google Cloud Pub/Sub
It’s a fully managed messaging service, like Apache Kafka, RabbitMQ etc. It may be used for large loads of data and guarantees that the message will be delivered at least once. Here, I simply created a topic called have-a-nice-day, it is a pretty straightforward procedure.
I’m pretty sure it is an overkill to use such a potent tool for a few bytes once in a day. But it is uncomplicated to use, and you pay for the data transferred, so why not?
Google Cloud Scheduler
The Google Cloud Scheduler can be found in the Tools section, on the GCP menu.
Once there, you will see the jobs you already set (probably none yet). Click on create a job to set the parameter to run. There, you have to give it a name, description, frequency, timezone and target.
The frequency is specified using unix-cron format. If you are like me, not familiar with it, I suggest you check the crontab guru for an accessible way to edit this kind of expression.
For the target, you may choose between HTTP, Pub/Sub or App Engine HTTP. Since I want to trigger a Cloud Function, I chose to post to a Pub/Sub topic. A HTTP request could also do the job.
Once chosen, it asks for the topic and the message you want to send. My job looked like this in the end:
Google Cloud Functions
Google Cloud Functions is a small piece of code that may be triggered by an HTTP request, a Cloud Pub/Sub message or some action on a Cloud Storage. In this step, I have simply selected the Cloud Pub/Sub Trigger and defined the topic it is listening to. For the environment variables, I put the credentials to access the Twitter API.
I decided to go with Python, since I found the code in the Python Twitter repo easy to use. In this page, you have to select which function is going to run when the Cloud Function is triggered, in this case, hello_cloudscheduler. You can find the code I used in this github gist. Don’t forget to put python-twitter in the requirements!
And it is done! With the work we did, we will run a task every day without the troubles of having to deploy and maintain a virtual machine only for that. You may use the same architecture to run a Cloud Dataflow every day. Since Cloud Pub/Sub is robust, you may increase this frequency as you like. Totally serverless.
And, of course, I enjoyed seeing some friends manually retweeting my automatic posts.