One weird trick to predict retrieval costs migrating to GCS Archive

Dom Zippilli
Google Cloud - Community
4 min readFeb 12, 2020

Disclaimer

I am a Googler, and I work in Google Cloud specifically. All opinions stated here are my own, not that of Google, LLC.

Additional Disclaimer

I’ll get to the point in a minute, but I want to really drive this home. In this article I’m going to discuss Google Cloud pricing. I am not an authority on Google Cloud pricing. I could have my facts mixed up. The facts could change after I publish and I might forget to update this. I could be confused about a subtle interplay between pricing rules. I cannot guarantee you anything I say here is accurate. So please, check your own use case and plans against the documentation at the Google Cloud Platform website, and if anything is unclear, contact support through the GCP console.

Introduction

With the launch of the new archive storage class (currently $1.23 per TB per month!) a hot topic around the office is retrieval costs. Most commonly, this inquiry is from customers who are already using a “colder” storage tier, especially Coldline, and want to go even colder.

When I say “cold”, let me clarify what I mean, as this is important to long-term success with archive storage of any kind. Typically, archive storage is stored in some way that is space- and cost-efficient, and that comes at the cost of ease of access. Think of it like keeping something in an unfinished attic. It “costs” you very little (usable square footage) to store things there, but it’s “expensive” (in time & effort) to get things in and out by carrying them up and down a ladder.

In GCS, there are four storage classes (which can be mixed in a given bucket, and are available independent of location). Here’s how they break down for access frequency:

  • Standard: Access regularly.
  • Nearline: Access less than once a month.
  • Coldline: Access less than once a quarter.
  • Archive: Access less than once a year.

It’s important to pay attention to these guidelines. The key is, you shouldn’t think of this as “all of the data has to be used this way.” It’s a topic for another time, but you basically want to be confident that most of the data will be used this way, or at least enough of it that you pass the break-even point for the switch.

In other words, if you have to go up to the attic less than once a year, most years, you have the right stuff stored there.

The One Weird Trick

Ok, so you’re sure Archive is for you. Now the question you might have is, will I pay retrieval costs to move existing GCS data there? Here’s how I break it down:

Unless Object Lifecycle Management moves the object for you, the move will be subject to retrieval fees (and possibly early deletion fees) for the source storage class.

Specifically, Object Lifecycle Management will perform the action with a SetStorageClass operation. This will be billed at the destination class’s Class A price*. But since SetStorageClass is not a rewrite, neither retrieval costs nor early deletion costs apply (it doesn’t create a new object).

One thing to note is that this only works in one direction: colder. Any move to a warmer storage class, like Standard, requires a new object via copy or rewrite operations, so retrieval (and possibly early deletion) costs will apply.

Scenarios

Moving data from Standard to Archive using Lifecycle Management

No retrieval cost. In this case, you’d pay one SetStorageClass (Class A) operation per object, at the rate of the destination class (Archive).

Moving data from Standard to Archive using the rewrite API method

No retrieval cost. In this case, you’d pay one storage.objects.rewrite (Class A) operation at the rate of the destination class (Archive). Though you are creating a new object, Standard has no retrieval costs, so operations are the only cost here.

Moving data from Coldline to Archive using Lifecycle Management

No retrieval cost. In this case, you’d pay one SetStorageClass (Class A) operation per object, at the rate of the destination class (Archive). The retrieval costs are waived as Lifecycle is changing the class directly, instead of creating a new object.

Moving data from Coldline to Archive using the rewrite API method

Definitely retrieval cost. Be careful in this case! You’re making a new object. You’d pay one storage.objects.rewrite (Class A) operation at the rate of the destination class (Archive), as well as a retrieval fee at the Coldline rate.

“All these archived objects just make me want to break out into song!”

*Currently, this is true only of the Archive storage class. Other storage classes are billed for operations at the source storage classes rate for SetStorageClass. Since this is incongruous, I stuck with the Archive case in this article.

--

--

Dom Zippilli
Google Cloud - Community

Googler, often writing about Google Cloud Platform. All opinions stated here are my own, not those of Google, LLC.