Python for Azure: Data Lifecycle Management Policy [Automated Access Tiering & Purge]

Pavleen Singh Bali
Python for Azure
Published in
5 min readDec 17, 2022

Introduction: Consider a scenario where data is frequently accessed during the early stages of the lifecycle, but later occasionally only after two weeks. However, beyond the first month, the data set is rarely accessed. In this scenario, hot storage is best during the early stages. Cool storage is most appropriate for occasional access. Archive storage is the best tier option after the data ages over a month. By moving data to the appropriate storage tier based on its age with lifecycle management (LCM) policy rules, you can design the least expensive solution for your needs.

Automatic Storage Access-Tiering based on Rule Definition

Each rule definition within a policy includes a filter set and an action set. The filter set limits rule actions to a certain set of objects within a container or objects names. The action set applies the tier or delete actions to the filtered set of objects.

Lifecycle management (LCM) supports tiering and deletion of current versions, previous versions, and blob snapshots.

Filters limit rule actions to a subset of blobs or specific container within the storage account. If more than one filter is defined, a logical AND runs on all filters. Actions are applied to the filtered blobs or containers when the run condition is met.

Azure Blob Data Lifecycle Management [Source]

Points to Remember:

With such Lifecycle management policy rules, it enables the following:

  • Transition current versions of a blob, previous versions of a blob, or blob snapshots to a cooler storage tier if these objects haven’t been accessed or modified for a period of time, to optimize for cost.
  • In this scenario, the lifecycle management policy can move objects from hot to cool, from hot to archive, or from cool to archive.
  • Delete current versions of a blob, previous versions of a blob, or blob snapshots at the end of their lifecycles.
  • Define rules to be run once per day at the storage account level.
  • Apply rules to containers or to a subset of blobs, using name prefixes or blob index tags as filters
  • Each rule can have up to 10 case-sensitive prefixes and up to 10 blob index tag conditions.
  • Only 200 ‘prefix_match’ based filter rules are available for ADLS Gen2 with Hierarchical Namespace (HNS) enabled i.e., only 200 specific blob container or blobs can be targerted to have particular Data LCM rules.
  • However, otherwise a specific Data LCM rule can be set at the scope of the whole storage-account which inherits the same LCM rule for containers and blobs in the storage account

Hands-On Implementation via Azure Portal & Python SDK for Azure

Prerequisites

Setup

pip install -r requirements.txt

Workflow

  1. In this workflow demo, I have firstly created a Resource group named “RG_Demo_ADLS_Data_LCM” and further created a Storage account named “demo00bloblcmpurge” (remember that the name of the storage account should be unique and in coherence with Azure storage account naming conventions ruleset)

Note: Remember to whitelist your IP in the “Networking” config settings of the storage account. Also, in the “Access Control (IAM)” config settings, add proper “role assignment” to yourself, especially ‘Storage Blob Data Owner/Contributor’ role for successful execution of this demo workflow.

2. The script below demonstrates the usage of Python SDK for Azure for implementing the above said workflow i.e., enabling blob lifecycle management (LCM), which offers a rule-based policy that you can use to transition blob data to the appropriate access tiers i.e., from hot → cool → archive or later to expire/purge data at the end of the data lifecycle.

3. Before running the script, in the terminal of the IDE do the following steps:

  • Log in to your Azure account
az login --tenant <tenant_id>
  • Select the correct subscription
az account set --subscription <sub_id/sub_name>

[Info]: Now, the “_get_credential” method using “DefaultAzureCredential” library can do the authentication properly.

  • After selecting the correct ‘Python Interpreter’ & correct ‘Configuration’ for the scope of your project like “Working Directory” etc. , run the script “blob_lifecycle_management.py”.
  • Following is the Python run-console with the workflow logs, please observed the highlighted text below.
Python console with work-flow logs

4. After the script is successfully executed, we can observe on the Azure portal side that a container named ‘container-lcm-purge’ with a base blob ‘blob-01’ has been created

Container named ‘container-lcm-purge’ with a base blob ‘blob-01’ has been created
Here after the Python script run an LCM rule named “test-python-lcm-rules” has been created
This is the Code View of the LCM policy with the ruleset as written in the Python script

Here in the image above, the predefined lifecycle management (LCM) rules incl. ‘purge/deletion’ of expired data can be seen in the Azure-Portal, thus validating the current workflow.

Key Observation from the Workflow:

  • If a data set needs to be readable, do not set a policy to move blobs to the archive tier. Blobs in the archive tier cannot be read unless they are first rehydrated, a process which may be time-consuming and expensive. Full Details Here
  • If you define more than one action on the same blob, lifecycle management applies the least expensive action to the blob. For example, action delete is cheaper than action tierToArchive. Action tierToArchive is cheaper than action tierToCool.
  • Lifecycle management policies are free of charge. Customers are billed for standard operation costs for the Set Blob Tier API calls. Delete operations are free.
  • However, other Azure services and utilities such as Microsoft Defender for Storage may charge for operations that are managed through a lifecycle policy.
  • If you enable firewall rules for your storage account, lifecycle management requests may be blocked. You can unblock these requests by providing exceptions for trusted Microsoft services.

--

--

Pavleen Singh Bali
Python for Azure

| Consultant @ Microsoft | Inspired Human | Chasing Dreams | Belief in "Cosmic <--> Self reflection" as a bidirectional Transaction |