Google Cloud Storage — Object Life Cycle Management — Part 1

Sundarrajan Raman
Google Cloud - Community
7 min readNov 28, 2022

As organisations are moving their data into the cloud, it is very essential for them to come up with a strategy for managing the data life cycle. Compared to an On-premise environment where there is no additional cost of keeping the data, Cloud environment charges for any data that is stored in a storage environment.

For organisations that are using On premise resources, managing the life cycle may not even be part of their strategies since there is generally no direct cost impact. But as they are migrating to the cloud this could turn into a huge cost impact if they do not arrive at a data life cycle strategy and implement it.

So in this two part series we will look into how

  • Introduction to Object Life Cycle Management (Part 1)
  • Setting up OLM Policies through Console (Part 1)
  • Understanding when OLM Policies will be executed (Part 1)
  • What are various Lifecycle rules that can be applied (Part 1)
  • How to programmatically set up the OLM policies (Part 2)
  • OLM actions Monitoring — Approaches. (Part 2)

Introduction to Object Life Cycle Management Policies:

Google Cloud Storage provides the feature to define Object Life Cycle rules for Buckets through which we can control how objects could be stored or deleted in a most cost effective way.

Setting up OLM Policies through Console:

For exemplification purposes we assume there is a Music company “MyMusic” that uses Google Cloud Platform to manage all their Music files. The MyMusic company has defined individual buckets based on the source of the music files. They get the music files from specific Bands. They create Bucket for every Band they source from. As they receive songs from the Bands, they store the files into their respective bucket. Once the files are received they apply editing on these files to make it ready for publishing to the end users. During this process they create temporary files in the bucket.

Initially they were a small company and they had minimal bands from where they were sourcing. They did not have any Life Cycle policies applied on this bucket. At the start they did not have any Life Cycle rules set up for these files. They had all the data kept in standard storage. They did not realise the cost of the temporary files since the file size was small and there was only a low volume of data.

Later as MyMusic became famous and they started sourcing from multiple Bands, they started seeing the cost increasing drastically. They found out that they are not applying Life Cycle Policies for the music files in the Buckets and all the files are staying in the standard storage.

During the analysis they found out that every Bucket has different kinds of files:

  1. Music Files that will be accessed On a daily basis
  2. Music Files that was received from the Bands
  3. Music Files that were temporarily created for editing.

For each of these kinds or Scenarios they came up with the below Life Cycle Actions.

Scenario — Standard:

File/Object Types: Music Files that will be accessed On a daily basis

Lifecycle Actions to be Taken: Keep the File in Standard Storage

Scenario — Coldline:

File/Object Types: Music Files that was received from the Bands

Lifecycle Actions to be Taken: After 1 month of receiving, move this file to Archive

Scenario — Delete:

File/Object Types: Music Files that were temporarily created for editing.

Lifecycle Actions to be Taken: After 5 days of creating Delete these files

Applying Life Cycle Policy:

We will now look at each of the scenarios and how a Life Cycle Action can be set up:

Scenario — Standard:

This is a scenario where the Published Files need to be Standard Storage forever. The Customers of the MyMusic company would want to listen into these files any time. So MyMusic should always make these files available for the customers. So there needs to be no action taken for this Scenario from a Life Cycle strategy.

Scenario — Archive:

In this Scenario, while the Original will not be referred to later, MyMusic as a responsible company needs to keep the original for a future reference. But they will be referring to this file for editing purpose till the Song is published. Once the song is published they will not be referring to the file. Also estimate that it would take a maximum of 20 days time to publish a song. Once a song is published the original files will not be accessed less than once a quarter. The best Storage for these kinds of scenarios with the low cost is Cold Line Storage. So using the below OLM policy set up they have achieved the task of moving original files to Cold Line Storage, one month after the file was received.

Example Structure of how they have Organised a Bucket:

Applying LifeCycle rules for the Scenario — Archive

Navigate to the LifeCycle tab and Select Add a rule.

Set the Storage Class as Cold Line for this scenario.

Click on Select Object Conditions and you will see the below Page:

We are trying to create the below rule:

Rule:

  1. Age: 30 days
  2. Prefix: original

Action: Move to Cold Line

By configuring the above conditions, all the objects with a prefix of “original” will be moved to Cold Line storage after 30 days from the time it was created. Let’s go ahead and set up the OLM Policy.

Once it is set up Click on create and you will see that it has been added.

So we have successfully set up the OLM Policies for Moving the Original music files to Cold Line Storage after 30 days of creation.

Scenario — Delete:

Next we will add a rule to Delete temporary objects that are more than 5 days old. Similar to how the previous OLM policy was created, let’s create another rule within the same bucket as below.

Rule:

Age: 5 days

Prefix: temporary-editing

Action: Delete

This rule has been created as below:

Finally MyMusic has implemented the LifeCycle policies through which they are efficiently storing the data based on the usage.

Understanding when OLM Policies will be executed:

Now that they have set up the OLM Life Cycle policies, let’s understand when this will take into effect after the conditions are met.

Google Cloud Storage manages all of this in the background where the actions are agnostic to the user. GCS does a regular inspection of all the objects present in the bucket for which the LifeCycle rules are configured. As it identifies that a particular object has met the conditions of a Life Cycle Rule it marks that object as eligible for performing the actions specified in the rule. The action could be moving the object to Cold Line or Deleting.

Once the object is marked for performing an action GCS asynchronously performs the action. While the condition might have been met the action may not be performed the right way. There can be lag between the time when the conditions are met and the time the action was performed. Your applications should not rely on lifecycle actions occurring within a certain amount of time after a lifecycle condition is met.

What are various Lifecycle Conditions that can be applied:

While in the example scenario we used 2 rules, there are multiple other rules using which the life cycle rules can be configured. Refer to this link for all the Life Cycle Management conditions that can be applied:

https://cloud.google.com/storage/docs/lifecycle#conditions

These conditions in combination give the user the ability to create powerful and flexible options to create Object Life Cycle Strategies based on the usage.

Good Cloud Storage also now provides an Autoclass feature that automatically transitions objects in your bucket to appropriate storage classes based on each object’s access pattern. Autoclass class can be referred from this link: https://cloud.google.com/storage/docs/autoclass

Summary:

Thus MyMusic was able to set up the Data Lifecycle for their data and there by they were able to save considerable cost. But as the company grew, they had further challenges with manually setting up the OLM policies for a huge number of buckets. Also they needed the ability to monitor the actions of OLM’s actions. We will discuss how they solved those challenges in Part 2 of this blog.

--

--