Object Versioning for Google Cloud Storage!
Suppose we have a lot of data in our Cloud Storage bucket and somehow by mistake someone runs
gsutil rm gs://my_bucket/*,
we will lose all our data and won’t be able to recover it easily or may never be able to recover it.
How does Object Versioning Help?
By Design, every storage object (file) in Cloud Storage is assigned 2 sequence numbers
- generation number
- meta-generation number
we will talk about them in detail later,
In a Nutshell, a generation number will be assigned each time we replace an object or modify it. similarly, a meta-generation number will be assigned to an object each time we modify the meta-data.
By default, object versioning is disabled as it incurs more cost as we store multiple versions of the same object with different generation and meta-generation numbers, but if we need the ability to recover old data we can leverage object versioning.
Enabling Object versioning
- We have just created a new bucket ashish_vtest having 2 files log.txt and Main.java
- we can check if the Object versioning is enabled on a bucket/folder
status can be Suspended or Enabled.
gsutil versioning get gs://ashish_vtest
Now Let’s enable the versioning
gsutil versioning set on gs://ashish_vtest
Checking Object Versions
- We can use gsutil ls -a gs://<path> to check all the files object versions. ( all the files including old version and current ones )
- we can see there is number after file names prefixed by #, this number is called generation number.
- we can access any non-current file( old versions ), using full name of the files ( name + generation number )
Deleting and recovering a file
- Now, let's do some real work first we will delete log.txt and then recover it using the generation number.
- as we can see there is only one current file, which we can check using gsutil ls gs://<path>
- but if we check all the Object versions we will still see 2 files
- Now, let's recover the log.txt and put it inside the same location.
- As you can see, we are just copying files and putting them inside the same directory, note that for using the non-current file we will have to use the file name and generation number together.
- we can see there are 2 current files but there will be 3 versions as a new version will be created when we copy.
Generations and Meta-Generation Number
- Even without Object Versioning enabled, all Cloud Storage objects have generation numbers and meta-generation numbers. The generation number changes each time the object is replaced, and the meta-generation number changes each time the object’s metadata is updated.
- Buckets maintain a meta-generation number enabling users to uniquely identify a bucket metadata state.
- we can check meta-generation number using -la flags in gsutil ls -la gs://<path>
- meta-generation number starts from 1 and increases as we update the metadata state of an Object.
- let’s update meta-data for log.txt and check the meta-generation number.
- We can edit metadata directly via UI or we can use CLI refer — https://cloud.google.com/storage/docs/viewing-editing-metadata#view
- from 3 dots on right side of an object on GCP UI, we can edit metadata
- Now, if we check, we will see 2 meta-generation numbers for log.txt
If we want to access file with specific meta-data, we will have to use the generation number if that version is not the current one.
Disabling Object versioning
- we can disable object versioning using gsutil command
gsutil versioning set off gs://ashish_vtest
- Even after disabling Object Versioning, all the versions which are there won’t be deleted, Although Cloud storage will not create any further Versions.
- documentation — https://cloud.google.com/storage/docs/using-object-versioning
If you like this article, please follow me and this publication for more interesting articles, a clap will be really appreciated..