Houston, we have a problem

Vishakha Kulkarni
Aug 2, 2019 · 3 min read

So, there we were, having deployed one of our solutions to Google Cloud, one that involved spinning up VMs on demand for our users for a fixed amount of time and then shutting those down. The project was a short and smooth one. A few months down the line and everything was running smooth as honey. We congratulated ourselves on a job well done and moved on to other things. And then , we got a frantic call from the client complaining that the bill for the last month was very high — almost 2 times.

We rolled up our sleeves and got into investigative mode which showed us a rouge script that had caused too many instance to be started and a corner condition where the instances had not shutdown. Needless to say we fixed the issue- though this blog is not about the defective script.

This whole incident got us thinking about how could we detect such problems early on. Today it was VMs, tomorrow something else could go out of control causing a huge spike in costs and we would know only at the end of the month. Yes, we could set up budget notifications, however by the time we got those, most of the budget would have been eaten up.

Around the same time, Google Cloud announced Cloud Scheduler - a fully managed enterprise-grade cron job scheduler. This then triggered the idea of putting scheduler and Google Cloud Functions together to do a daily check of cost and notify in case there were any spikes. So the idea is simple really.

The Google Cloud Scheduler, can be configured to trigger a cloud function. The scheduler can trigger an HTTP URL or publish a message to a topic in pub-sub, which in turn triggers a cloud function. Google Cloud Function was a logical choice for this since this needs to be run only once a day. The cloud function runs a query on Google BigQuery billing data to determine whether to send a notification and then uses sendgrid to send an email. Of course, this assumes that billing data is being exported to big query. To know how to set this up, go click here.

Setting up the schedule and pub-sub was done using the below commands:

gcloud pubsub topics create DailyUsageCheckTopicgcloud scheduler jobs create pubsub DailyUsageCheckJob --schedule "0 0 * * *" --topic=DailyUsageCheckTopic --message-body="Perform Daily Check"

The next step was to create the Cloud Function. The cloud function was written in node.js and most of the logic was encompassed in the query to be run in Google BigQuery itself.

We created a sendgrid account and used that to send email notifications. To know more about how to setup sendgrid click here.

The API_KEY to be used for sendgrid, the email ids to be used are stored as environment variables. We also decided to parameterize the dataset and table name, the threshold value and the number of days to be considered for daily average.

Once the function was ready, the next step was to deploy the function. This is how it was done

gcloud functions deploy DailyUsageCheckFunction --runtime nodejs8 --source . --env-vars-file env.yaml --trigger-topic=DailyUsageCheckTopic

This simple tool, which we now use across accounts, helps us get early notifications of any abnormal usage behavior that results in cost spikes.

You can download the code from here and set this up in your own projects. Enjoy!

CloudWithMORE

Get MORE with your Cloud

Vishakha Kulkarni

Written by

CloudWithMORE

Get MORE with your Cloud

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade