Google Cloud Storage on a shoestring budget

A common question I get at Meetups and conferences relate to billing optimization. Things like “what’s the cheapest way to do X” or “how do I cut my costs” or “how do I not go over my free quota”.

IMHO Google Cloud Platform has some of the most transparent, cost effective pricing structures available from cloud providers. But even with that, it’s still a challenge to properly figure out how to optimize your architecture for price.

So, let’s take a look at how to run Google Cloud Storage on a Shoestring budget.

Disclaimer : These prices are accurate as of 10/5/2017. Since time continues to move forward, these prices may not be accurate in the future. Also, these are my best attempts at “math”. All standard disclaimers about me being bad at math apply here.

The basics.

A quick summary of GCS’ pricing page reveals that your GCS bill is a composite of 3 main things : Data storage, Network usage, Operations (4 if you’re using near-line, but we’ll ignore that.)

Here’s what you get for free, per month:

  • 5GB storage (per region)
  • 5k A ops, 50K B Ops
  • 1GB outbound traffic from US

We will need to factor this into all of our forward facing calculations, since this quota is taken into account for your monthly services. Let’s break each one down really quick and see what the costs would be.

GCS Storage costs (above quota)

Once you eclipse your free storage quota, you’ll see that storage costs differ slightly per area. For example, us-central1 is 0.02 per GB, where us-east1 is 0.023 (it also changes if your in AMEA or APAC).

So, what’s it cost to store 1TB of data per month in the IOWA center (us-central1)?

(1024GB-5GB) * 0.02 = $20.38

That’s pretty cheap, I suppose. That’s basically lunch for two people to store 1TB of data.

Network Egress

You’re charged for outbound costs, depending on the origin, and the outbound tier. For example, if you only transfer 0–1TB, from the Iowa location, it’s $0.12 per GB to anywhere in the world (except China and Australia which have diff costs) based on a tiered structure (so if you go over 1TB, you’ll get charged differently).

So, what’s it cost to transfer 1TB of data a month to to external clients (outside of GCP) in the US or EU?

(1024GB — 1GB) * 0.12 = $122.76

Data Operation

Operations on a bucket/object are anything which queries or changes data on it. Ops are broken into two categories : A and B, and have different costs associated:

$0.05 per 10k ops (A — insert, patching, Listing buckets, listing objects, watching & triggers)

$0.004 per 10k ops (B — mostly everything else)

So, let’s say your 1TB of data is broken across 1k objects in GCS.

Let’s say the average user looks at 500 object listings a month (you’re a shoe site, or something), in batches of 50, so 10 calls to our API per month, per user.

What’s this cost?

If we have 10k users @ 10 API calls a month, that’s 100k Class A ops. Objects.list is a Class A operation. Meaning we get 5k of those free a month. . so:

(100k -5k)*0.05 = $4.75

What’s the cheapest way to store and distribute my data from GCS?

Prices are highly dependent on your scenarios, so let’s take a few examples.

Scenario 1 : Data in a regional bucket, transferring to the same region.

Note: moving data from a regional bucket to a service (GCE,GAE,GCF,GKE) in the same region is free. However the cost for network egress looks to be uniform between GCS, and all services. So, there’s no upside to sending data to a Frontend and serving that to a user.

Although sending GCS->Frontend->user will incur costs if the frontend is in a different region than the GCS bucket.

Scenario 2 : Data in a regional bucket, transferring to a different region.

Let’s put our data in us-west1, and have a user in us-east4. We’d be getting charged the same $122.76 cost for network egress, but our performance wouldn’t be ideal due to distance latency.

What if we copied the data to a bucket in a closer region (us-east4)?

(1024GB-5GB) * 0.02 = $20.38 and is doubled, since us-east4 and us-west1 have the same storage cost. If you assume that xfer costs remain the same, then you’d end up with a total cost of $122.76 for network, and $40.76 for storage giving $163.52. So, a little more expensive.

What if we used a multi-region bucket instead?

Storage costs are 0.026 / gig / month for multi-regional. 
So, (1024GB-5GB) * 0.026 = $26.494 for storage, plus $122.76 for network gives us $149.25. So it’s about ~18 dollars cheaper to use a multiregional bucket instead.

Would the load balancer help?

You can specify the load-balancer to fetch from a GCS bucket directly, which would allow the data to be distributed and cached via the CDN w/o needing to set up extra buckets.

LB is 0.08 per /GB for the first 10TB of cache-egress, and 0.04 /gig of cache fill.

So, what would it cost to xfer 1TB of data between regions to a client using the LB as a front end to the bucket?

  • Cache fill would be 0.04*1024 = $40.96 to fill the entire thing in the cache
  • We pay 0.08/GB to egress the data from cache, so if we assume 1Tb of cached data, that’s 1024*0.08 = $81.42.
  • So, to store the data in Region US, and send it to a client in region UE through the LB, you’d be paying about $122.88.

So, summary:

For cross-region fetches (in the same multi-region):

  • If you don’t care about performance leave it in a single region.
  • If you care about performance, then it’s cheaper to just use a multi-regional bucket, rather than duplicate between buckets.
  • For cross-multi-region fetches it’s cheaper to use the load-balancer as a front-end “across the pond” and let caching handle the rest, at the cost of spikes in initial cache-fill latency.

What’s the cheapest way to list my objects in a bucket?

If we have 10k users @ 10 API calls a month, that’s 100k Class A ops.

Objects.list is a Class A operation. Meaning we get 5k of those free a month… so: (100k -5k)*0.05 = $4.75

Does it make more sense to list that in Datastore?

Considering DataStore is 1GB of storage free (0.18 /gig over that); it costs you 0.06 per 100k entries read.

If we had 500 object listings, let’s assume 1k of metadata each; that would be 500k of metadata storage (which is within the free tier) .

Each user is looking at 500 object listings a month; at 10k users, that’s 5000k entity reads / month. Factoring in the cost of 0.06 per 100k reads, we end up with ((5000–50)/100)*0.06= $2.97

So, Summary:

It’s cheaper to store your object metadata in data store, and fetch it there, rather than doing a GCS bucket listing.

What’s the cheapest way to get metadata on my objects?

Custom metatdata on an object is charged per character for the object. If you assume 1k metadata per object, and 100k objects, you’re looking at an additional 1k of data in your GCS bucket charges. That’s still relatively low for storage, ($9.5^e-7 cents total);

Fetching that metadata falls as a classB operation, which you get 50k of free / month. So, if you were doing 100k metadata fetches a month, that’s ((100k-50k)/10k)*.004 = $0.02.

Likewise, DataStore gives you 1GB free storage a day; so your 100k would be in the free tier, storage wise, daily.

You get 50k entity reads / day free. If we assume 100k / month, then 100k / 30 = 3k reads / day.

All within the free tier.

In order for Datastore to not make sense, you’d have to be doing >50k entity reads a day, at which point, you’re pro-rated charged $0.06 per 100k entities you read. Meaning you’d be doing 1500k+ entity reads a month before you’d jump over the free tier, but even at that point, it’s still cheaper to do the reads in Datastore rather than grabbing metadata directly from GCS.

So, Summary:

Datastore is the cheapest way to store and fetch metadata on your objects.

Are there other cheap ways to do things?

Absolutely! Stay tuned to my medium page for more details, and don’t forget to check out Google Cloud Performance Atlas, where we help you trim down your cloud usage and maximize your profit-to-cost structure.