Sundarrajan Raman
Google Cloud - Community
7 min readNov 28, 2022

--

Google Cloud Storage — Object Life Cycle Management — Part 2

This article is the second in an ongoing series. The first is part of this story is available at Google Cloud Storage — Object Life Cycle Management — Part 1

In Part 1 we understood why an organisation should come up with a Data/Object Life Cycle Management Strategy and also how to implement the same using Google Cloud console.

In this part we will look at how to configure Google Cloud Storage’s Object Life Cycle Management (OLM) programmatically. We will use the programming language Java for this blog’s discussion.

Programmatic Object Life Cycle management :

In part 1, for example, we had a scenario where an Organisation name MyMusic used the OLM Strategies to effectively save the cost of storing files in Google Cloud Storage.

As time went by the organisation had great success. They had multiple Bands with whom they started partnering up with. With more Bands the number of Buckets they created started increasing, since they created one Bucket per Band. As the number of Buckets increased they found it difficult to create the OLM policies manually for each of the Bucket.

So they needed a programmatic way through which they would be able to create the OLM policies and also manage those policies.

Setting up the project:

To be able to Google Cloud Storage API we need to to import the below dependency. As of writing this blog 2.13.1 was the latest version of google-cloud-storage artifacts. When you are setting up please refer to the latest version.

Create a Java Maven project and add the below dependencies in its pom.xml

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>2.13.1</version>
</dependency>

Scenarios:

We will be discussing the following scenarios to Programmatically adding the Life cycle Management Rules.

  1. Add new LifeCycle Rules with Conditions including Age, Prefix, Suffix
  2. Update an existing LifeCycle rule — for Age Condition.
  3. Update an existing LifeCycle rule — for Prefix — Update/Add a prefix.

Below tables details the rules for each of the scenarios.

Programmatic management of Object Lifecycle Management Rules

Sample Code for each of the above scenarios are listed below.

Scenario: Add a New LifeCyleRule

LifecycleRule lcr1 = new LifecycleRule(
LifecycleAction.newDeleteAction(),
LifecycleCondition.newBuilder().setAge(50)
.setMatchesPrefix(Arrays.asList(“delete”, “del”))
.setMatchesSuffix(Arrays.asList(“.mv”))
.build());

LifecycleRule lcr2 = new LifecycleRule(
LifecycleAction.newDeleteAction(),
LifecycleCondition.newBuilder().setAge(100)
.setMatchesPrefix(Arrays.asList(“remove”, “rem-”))
.setMatchesSuffix(Arrays.asList(“.mv”))
.build());

bucketRetreived.toBuilder().setDeleteRules(null)
.setLifecycleRules(ImmutableList.of(lcr1, lcr2)).build()
.update();

bucketRetreived = storage.get(bucketName);


for (LifecycleRule lcr : bucketRetreived.getLifecycleRules()){
System.out.println(“Condition : “ + lcr.getAction());
System.out.println(“Condition Age: “ + lcr.getCondition().getAge());
System.out.println(“Condition Prefix: “ + lcr.getCondition().getMatchesPrefix());
System.out.println(“Condition Suffix: “ + lcr.getCondition().getMatchesSuffix());
}

Scenario: Updating an Existing LifeCycleRule’s Condition (Age)

List<LifecycleRule> lcrList = (List<LifecycleRule> ) bucketRetreived.getLifecycleRules();
List<LifecycleRule> lcrListUpdated = new ArrayList<LifecycleRule>();
for (LifecycleRule lcr : lcrList){
LifecycleRule lcrUpdated = new LifecycleRule(
LifecycleAction.newDeleteAction(),
LifecycleCondition.newBuilder().setAge(lcr.getCondition().getAge() — 5 )
.setMatchesPrefix(lcr.getCondition().getMatchesPrefix())
.setMatchesSuffix(lcr.getCondition().getMatchesSuffix())
.build());
lcrListUpdated.add(lcrUpdated);
}
bucketRetreived.toBuilder().setDeleteRules(null)
.setLifecycleRules(lcrListUpdated)
.build()
.update();

Scenario: Updating an Existing LifeCycleRule’s Condition (Prefix)


List<LifecycleRule> lcrList = (List<LifecycleRule> ) bucketRetreived.getLifecycleRules();
List<LifecycleRule> lcrListUpdated = new ArrayList<LifecycleRule>();
for (LifecycleRule lcr : lcrList){

List<String> prefixList = new ArrayList<>(
lcr.getCondition().getMatchesPrefix());
prefixList.add(“remove-”);

LifecycleRule lcrUpdated = new LifecycleRule(
LifecycleAction.newDeleteAction(),
LifecycleCondition.newBuilder().setAge(lcr.getCondition().getAge() )
.setMatchesPrefix(prefixList)
.setMatchesSuffix(lcr.getCondition().getMatchesSuffix())
.build());
lcrListUpdated.add(lcrUpdated);
}
bucketRetreived.toBuilder().setDeleteRules(null)
.setLifecycleRules(lcrListUpdated)
.build()
.update();

Note: For every update call to the Bucket, we need to explicitly call setDeleteRules(null).

Monitoring GCS Bucket OLM Lifecycle Actions:

Audit Logs are very essential to understand the actions performed on the resource that we are interested in. Google Cloud provides an extensive Audit Logging mechanism through which all actions on any resource can be tracked.

The documentation reference here provides a detailed view of Audit Logs Audit Logs:

https://cloud.google.com/logging/docs/audit/understanding-audit-logs#api

Since OLM Actions are performed in the background by the Google Cloud Storage monitoring the actions of OLM are slightly different from the regular actions being performed on other resources. Let’s understand how to monitor the OLM actions through various approaches.

OLM Actions Monitoring Approaches:

OLM Lifecycle Actions will not be tracked as part of the Audit Logs. So if you are searching for the OLM actions in the Audit Logs, you will not be able to find those. That looks strange since you want to make sure you have a track of all the actions performed in the background by Google Cloud. But there is a way out. Google Cloud Storage provides the below options to track OLM Actions.

  1. Cloud Storage Usage Logs
  2. Pubsub Notification for Cloud Storage

We will see a detailed steps on how to set up the of you can setup your

Cloud Storage Usage Logs:

Setting up the Usage logs are detailed here:

https://cloud.google.com/storage/docs/access-logs

A gist of the commands required to easily set up the Cloud storage logs are listed here.

Once the Action is performed by the Object Life Cycle Management, it takes an Hour for the same to be reflected in the Logs. So there will be lag between the time the action is taken and for it to get reflected in the logs.

Example set up:

gsutil mb gs://mymusic_logs
gsutil iam ch group:cloud-storage-analytics@google.com:legacyBucketWriter gs://mymusic_logs
gsutil logging set on -b gs://mymusic_logs gs://band_boon_light
gsutil logging get gs://band_boon_light

Once the Usage Logs are set up for this bucket, for every action you can start seeing logs getting created in the Usage Logs bucket (gs://band_boon_light) in this example.

Once the logs are generated these could be easily queried by setting up a Big Query table over these logs. Setting up the BigQuery table is not in the scope of this blog. But the same can be referred from this link: https://cloud.google.com/storage/docs/access-logs#BigQuery

Pubsub Notification for Cloud Storage:

The Second approach which is my preferred approach for auditing the logs is using the Pubsub Notification for Cloud Storage. By Enabling Pubsub Notifications for Cloud Storage all actions on the Cloud Storage will be sent to a Topic. The steps to enable the Pubsub notifications are listed below.

The Notification can be enabled as below:

gcloud storage buckets notifications create gs://band_boon_light — topic=mymusic_topic — event-types=OBJECT_DELETE

In this scenario we are tracking only the delete events performed by the OLM. For tracking other actions you need to add other event-types to this command. Once this command is executed the Cloud Storage will start publishing OLM and other action’s notifications to the mentioned topic mymusic_topic. This command will also create a Topic for you in the Pubsub service in the project that you have configured.

Next step is to create a Subscription for the above Topic and make it a BQ Subscription. With that you will be able to automatically create a pipeline that will write the data directly into BigQuery. Once data is written into a Big Query table you will be easily able to start querying.

But before creating a BQ Subscription you need to create a BQ table with the following Schema so that the data will be seamlessly written into the BQ table.

The Schema for the BQ Table needs to be configured as below. You need to make sure the spelling of the columns are as mentioned below. If there is a small change in the schema then there will be issues when creating the BQ Subscription.

Once the BigQuery Table is created as mentioned above, the below BQ Subscription for Topic needs to be created:

Make sure to select both Topic Schema and Write metadata in the options. This will ensure logging both Message Attributes and the Payload into the Table.

Set up an Object Life Cycle Management for Target Bucket.

In this example the Bucket for which OLM policy is set is band_boon_light.

2 Prefix Rules are created:

To delete any object with a Prefix : “temporary-editing” after 30 days of creation

Movie Objects with prefix “original” after 5 days of creation to Coldline

To be able to see the logs: Will you need to wait till the OLM Management has taken the actions on the Objects.

As any actions occur within the Bucket you can Query the BigQuery Table.

Sample Query with which the Big Query table can be queried:

SELECT distinct JSON_EXTRACT(attributes, “$.bucketId”) AS Bucket_id, 
JSON_EXTRACT(attributes, “$.objectId”) AS Object_id,
JSON_EXTRACT(attributes, “$.eventType”) AS event
from `mymusic_dataset.mymusic_olm_tables`

While the Approach of using Pubsub notification for Cloud Storage is clean and easy, the main drawback of this approach is, it does not record the user who performed the action. So if you need to track who performed the action be it user, service account or OLM, you will not be able to identify with this approach. In that case you need to use the Cloud Storage usage logs approach.

Also this approach has slightly increased cost due to the creation of Pubsub topic and using BigQuery for writing the data. But you can use the below control measures to carefully reduce the cost.

Control Measures for Cost:

Since we are setting up the notifications only for OBJECT_DELETE action, the volume of data getting published will be drastically reduced.

No retention period required in Pubsub as data as data is retained in BQ

Based on the advantages and the requirements of OLM actions monitoring you need to choose the approach that you need to use.

Finally, MyMusic organisation was able to set up a routine process using which they are able to programmatically set up OLM rules through which they are able to effectively manage their objects in Google Cloud Storage based on their usage. This also helped them optimize their cost.

Thank you for taking time to read through this 2 part blogs. Hopefully this has helped you in understanding how OLM policies could be used to optimise your data persisted in Google Cloud Storage.

References:

https://cloud.google.com/storage/docs/lifecycle#tracking

https://cloud.google.com/storage/docs/access-logs

https://cloud.google.com/storage/docs/pubsub-notifications

https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions

--

--