Python for Azure: Enable and Create Blob Snapshot on Azure Blob Data

Published in

Python for Azure

4 min readDec 9, 2022

Introduction: Blob snapshot services provide the ability to take snapshots of the blobs, which can be used to track changes made to a blob over time. Being able to restore a blob to a previous state makes it handy for data tracking purposes. Your data is always stored in several copies in Azure Storage. You can undo any modifications made to a blob by using snapshots, and you can even go back to the original base-blob.

Blob Snapshots: Configuration/Setting at the scope of the Blob in the Blob Container

A snapshot is a read-only version of a blob that’s taken at a point in time. The only difference between a snapshot of a blob and its base blob is that the ‘DateTime’ value appended to the blob URI to indicate the time the snapshot was taken.

For example, if a page blob URI is http://storagesample.core.blob.windows.net/example1/apple

the snapshot URI is similar to http://storagesample.core.blob.windows.net/example1/apple?snapshot=2022-12-09T09:00:00.0000000Z

Each write operation on the base-blob creates a snapshot with the changes that were done on it

Points to Remember:

You can create a snapshot of a blob in the Hot or Cool tier. Snapshots on blobs in the Archive tier are not supported.
Your account may be charged extra for data storage if you create a snapshot, which is a read-only copy of a blob.
Blob snapshots, like blob versions, are billed at the same rate as active data.
If you have not changed a blob or snapshot’s tier, then you are billed for unique blocks of data across that blob, its snapshots, and any versions it may have.

Hands-On Implementation via Azure Portal & Python SDK for Azure

Prerequisites

Python 3.6 or later is required to use this package
You must have an Azure subscription and an Azure storage account to run the python code below.

Setup

Install all the requirements Azure libraries for Python with pip:
Clone or download this project repository: Python-for-Azure
Open the project folder ‘azure-blob-snapshot’ in Visual Studio Code or your IDE of choice.
From the root location of the project folder run the following command.

pip install -r requirements.txt

Workflow

In this workflow demo, I have firstly created a Resource group named “RG_Demo_ADLS_Data_Protection” and further created a Storage account named “demo00blobsnapshot”.

Note: Remember to whitelist your IP in the “Networking” config settings of the storage account. Also, in the “Access Control (IAM)” config settings, add proper “role assignment” to yourself, especially ‘Storage Blob Data Owner/Contributor’ role for successful execution of this demo workflow.

2. The script below demonstrates the usage of Python SDK for Azure for implementing the above said workflow i.e., enabling blob snapshot on ADLS storage-account and creating blob snapshot for each write operation on the base blob

3. Before running the script, in the terminal of the IDE do the following steps:

az login --tenant <tenant_id>

Select the correct subscription

az account set --subscription <sub_id/sub_name>

[Info]: Now, the “_get_credential” method using “DefaultAzureCredential” library can do the authentication properly.

After selecting the correct ‘Python Interpreter’ & correct ‘Configuration’ for the scope of your project like “Working Directory” etc. , run the script “blob_snapshot.py”.
Following is the Python run-console with the workflow logs, please observed the highlighted text below.

4. After the script is successfully executed, we can observe on the Azure portal side blob snapshot i.e., the container named ‘container-blob-snapshot’ with a directory named ‘test_00’ with the base blob ‘snapshot_blob.txt’ has been created

The container named ‘container-blob-snapshot’ with a directory named ‘test_00’ has been created

Also, in the ‘test_00’ directory, a blob named ‘snapshot_blob.txt’ is created

Here in the image below different snapshots of the blob ‘snapshot_blob.txt’ can be seen, thus validating the current workflow.

Different snapshots are visible for each write operation on the base-blob

Note:

This feature is currently in preview and its stable version will soon be in General-Availability (GA)
Also with Hierarchical Namespace (HNS) enabled on ADLS Gen2 storage account, blob versioning feature is not supported so blob snapshot offers a work-around to capture historical-data trends on the base blob
Avoid calling methods that overwrite the entire blob, and instead update individual blocks to keep costs low.

Key Observation from the Workflow:

With each ‘write operation’ on the same base blob, a new snapshot of the blob w.r.t that write operation of the blob is created
This can help to retrieve the older version or some specific version of a blob via snapshot, if there is a need
To overwrite a base blob with one of its snapshots, you would simply use start_copy_from_url method on blob client and provide the URL of the required bolb-snapshot to it. URL of the blob-snapshot will be same as that of the base blob but with ‘DateTime’ stamp appended as suffix.