How to programmatically monitor your Cloud Dataflow jobs

Ever wanted to define alerts and monitor the status of your Cloud Dataflow jobs programmatically instead of checking some UI every 30 minutes?

You can do that with Stackdriver Monitoring and the metrics that Cloud Dataflow exports to Stackdriver. Here is a handy list of metrics available for Dataflow:

My favorites:

job/is_failed — This should be self-explanatory. Very easy to set up alerts on failed jobs.

job/system_lag, job/data_watermark_age — If you have a streaming pipeline, these two metrics will indicate if your pipeline is beginning to lag.

job/elapsed_time — Another easy-to-understand metric. This is how long your pipeline has taken so far. If you wanted to alert on pipelines exceeding predefining duration thresholds, here is your chance.

And if you still prefer a graphical UI for monitoring your metrics, check out this blog post.