Budget Alerts & ‘Caps’ in Google Cloud

Alistair Grew
Appsbroker CTS Google Cloud Tech Blog
8 min readNov 3, 2023

Preventing unexpected cost overruns…

Source: Our Customer’s Billing Account

Introduction

Today I will be addressing the topic of budget ‘caps’ in Google Cloud having been inspired to write this post following a recent experience a customer of ours has had.

One of the great advantages of the Cloud is its elasticity. This fact has enabled some of the biggest names in tech to flourish by enabling growth trajectories that would otherwise be impossible. To pick a prominent Google example, the craze that was Pokemon Go simply wouldn’t have been able to scale as fast as it did while ensuring stability.

Source: https://www.linkedin.com/pulse/great-power-comes-responsibility-why-tech-industry-junction-koolen/

The elasticity Clouds offer provides huge capability and power, and to half-quote Spiderman with this power comes great responsibility.

Responsibility

So what is this great responsibility that comes with Cloud usage? Well many things, but today I want to focus on the shift from capex to opex and the implications that can have. I won’t however get too deep into the FinOps space

So, in the context of Google Cloud, who is responsible for your usage and therefore cost? Well for IaaS and PaaS, you as the customer are...

Source: https://cloud.google.com/architecture/framework/security/shared-responsibility-shared-fate

Common Misconceptions

Talking about responsibilities leads nicely into some common misconceptions I have heard in regard to budgets in GCP.

Surely Google (or another provider) will stop me from spending too much?

It simply isn’t in Google’s interest to do this for many reasons but here are some of the main ones I could think of:

  • It doesn’t make business sense for Google, why would you as a business want to prevent customers from spending more money with you?
  • Google doesn’t know why your spending may have increased.
    — You could be running a workload you only run sporadically such as training an ML model.
    — You could be seeing increased usage for legitimate reasons, previously I worked in e-commerce, and on Black Friday our revenue was multiple times a typical day as was our usage as a result, providing these things are in proportion that isn’t an issue. If our cloud provider had pulled the plug on Black Friday we would probably have sued them for damages and found a new provider.

If my system gets attacked the Cloud vendor will pay right?

In the Cloud, security is a shared responsibility as well. All the main cloud providers have tooling to help ensure a good security posture. Whilst this is a broad topic I want to narrow in on two types of consumption-causing attacks I have witnessed in the wild DDoS and account compromise.

Source: https://cloud.google.com/blog/products/identity-security/google-cloud-mitigated-largest-ddos-attack-peaking-above-398-million-rps

With DDoS what you typically see is increased frontend load causing some services to scale to try and cope with the increased (but false) demand. To combat this I recommend two main options:

With account compromise, I have seen accidentally leaked credentials used to spin up cryptocurrency mining operations. These typically use large numbers of CPUs or GPUs to rack up large bills. This is such a problem in fact that Google has published an entire page of best practices on mitigating any risk around this.

If I go over my budget alert Google will keep notifying me if it keeps increasing.

Source: https://www.forbes.com/sites/niallmccarthy/2018/09/28/major-construction-projects-that-went-catastrophically-over-budget-infographic/

This one simply isn’t true, I will talk more in-depth about budget alerting shortly but if you, like our customer set a billing alert at say $3000, Google will not alert you any further after you have exceeded it. Even if it goes 10,000% over this threshold. To combat this I would recommend:

  • Regularly reviewing your budget alerts to ensure they are still fit for purpose.
  • Setting additional alerts above 100% of your expected spend to let you know if costs do go ‘out of control’.

Budget Alerting

So now that I have hopefully dispelled some common misconceptions how does budgeting work in Google Cloud?

Google Cloud has the concept of billing accounts which can be linked to one or many projects that contain resources that cost money.

Source: https://cloud.google.com/billing/docs/concepts

Some topologies are more complicated with this with parent and child billing accounts but we will keep things simple here.

Within each billing account, there is a ‘Budgets & alerts’ section where you can define your budget, as you can see below in my org I have set an alert to go off when my spending goes over £10/month which I shouldn’t ever hit.

The other thing I want to highlight on this page is the grey box which reiterates that setting a budget does not cap resource or API consumption.

Source: A screenshot of my billing account.

Further to this, if you review the Google documentation around budgets Google has this bright red call-out box:

Source: https://cloud.google.com/billing/docs/how-to/budgets

And another one further down:

Source: https://cloud.google.com/billing/docs/how-to/budgets

I’m not going to go into huge detail on how to set budgets as I think Google’s documentation does a good job around this and I would simply be repeating it verbatim. Needless to say, I think doing this should be one of the first things you do when setting up a new project.

Once you have decided on your budget you should configure alerting. The key here in my opinion is to use a combination of different alert thresholds to strike the right balance between observability without causing alert fatigue.

Source: https://sensu.io/resources/whitepaper/alert-fatigue-guide

Unfortunately at the time of writing Google doesn’t let you scope your budget to a rolling window and instead has presets for monthly, quarterly, and yearly. There is the ability to do custom ranges but realistically the only way you are going to do tighter ranges is programmatically.

To me, the smallest preset scoping of a month presents a bit of a risk. I am writing this on the 1st of November. If, to reduce alert fatigue I only had a notification setup at say 75%, 100%, and 125% threshold of my budget but a bug in my code caused me to spend 74% of my budget today I am not likely to know about it until tomorrow at the earliest. This can be mitigated to an extent with forecasting but as with all ML-based prediction, the inference is only as good as the source data. I have for example seen large licensing expenditures for Looker or marketplace purchases create massive legitimate peaks like the screenshot below which can sometimes throw off forecasting.

Source: Billing dashboard showing a large marketplace spend.

The other elephant in the room around budgets is that Google Cloud’s billing data is only eventually consistent, and the speed at which cost from different services is added to the billing account varies considerably from almost instantaneous to sometimes over 24 hours.

Source: https://cloud.google.com/billing/docs/how-to/resolve-issues#missing-transactions
Source: https://cloud.google.com/billing/docs/how-to/budgets

Thinking about how to solve this problem I think the answer is probably to utilise the ability of a budget alert to send a message to a Pub/Sub queue to trigger some programmatic logic as per the ‘Cost Control Response’ above. I think this is probably an interesting enough problem to warrant a post all on its own though so watch this space!

Update: My good friend Narish Samplay has beaten me to it and written a really good post that is well worth your time.

Budget Visualisation

Whilst talking about budgets it would almost be remiss of me to not suggest dashboarding your cloud spend. I would argue that in most cloud-centric organisations it is as valuable as other operational metrics. Fundamentally this can be achieved by hooking up your favourite dashboarding tool to a billing BigQuery export. Google has a tutorial on how to do this with Looker Studio, but various people have done this with Grafana and it should be possible with PowerBI as well.

Source: https://cloud.google.com/billing/docs/how-to/visualize-data

Budget ‘Capping’

As we have already stated setting a budget does not cause Google to stop your services but what if this is behavior you wish to have? Well, Google identifies two ways to do this:

  1. Create a cost control response that detaches the project from the billing account with a Cloud Function. This does come with a clear warning though that unexpected consequences including data loss could occur.
  2. Capping API Usage, which could restrict the ability to spend with certain services (your mileage may vary!).

In regards to the former Google even has example Cloud Function code in both Python and node.js.

Help! I have unexpectedly blown my budget!

Source: https://tenor.com/view/broke-fairlyoddparents-burning-money-gif-4486655

But what do I recommend doing if you are reading this after unexpectedly blowing your budget?

  1. Stop the consumption, this might mean rolling back system changes or enabling some security protection.
  2. Understand what caused the consumption and collect evidence, Google may well want to see this and understand what steps you have taken to stop the consumption and prevent reoccurrence.
  3. Contact your billing provider be it a reselling partner or Google.

Any bill adjustment is entirely at the discretion of Google. Anecdotally they are often sympathetic but this shouldn’t be taken for granted.

Conclusion

As I stated at the beginning of this post I was inspired to write this following an experience a customer of ours has had. If I can avoid someone else from having a ‘brown trouser’ moment and the several awkward conversations that follow then I consider this post worthwhile. Anyway until next time as ever, keep it Googley :)

--

--

Alistair Grew
Appsbroker CTS Google Cloud Tech Blog

GCP Architect based in the Manchester (UK) area. Thoughts here are my own and don’t necessarily represent my employer.