Python for Azure: Enable and Create Blob Snapshot on Azure Blob Data
Introduction: Blob snapshot services provide the ability to take snapshots of the blobs, which can be used to track changes made to a blob over time. Being able to restore a blob to a previous state makes it handy for data tracking purposes. Your data is always stored in several copies in Azure Storage. You can undo any modifications made to a blob by using snapshots, and you can even go back to the original base-blob.
Blob Snapshots: Configuration/Setting at the scope of the Blob in the Blob Container
A snapshot is a read-only version of a blob that’s taken at a point in time. The only difference between a snapshot of a blob and its base blob is that the ‘DateTime’ value appended to the blob URI to indicate the time the snapshot was taken.
For example, if a page blob URI is http://storagesample.core.blob.windows.net/example1/apple
the snapshot URI is similar to http://storagesample.core.blob.windows.net/example1/apple?snapshot=2022-12-09T09:00:00.0000000Z
Points to Remember:
- You can create a snapshot of a blob in the Hot or Cool tier. Snapshots on blobs in the Archive tier are not supported.
- Your account may be charged extra for data storage if you create a snapshot, which is a read-only copy of a blob.
- Blob snapshots, like blob versions, are billed at the same rate as active data.
- If you have not changed a blob or snapshot’s tier, then you are billed for unique blocks of data across that blob, its snapshots, and any versions it may have.
Hands-On Implementation via Azure Portal & Python SDK for Azure
Prerequisites
- Python 3.6 or later is required to use this package
- You must have an Azure subscription and an Azure storage account to run the python code below.
Setup
- Install all the requirements Azure libraries for Python with pip:
- Clone or download this project repository: Python-for-Azure
- Open the project folder ‘azure-blob-snapshot’ in Visual Studio Code or your IDE of choice.
- From the root location of the project folder run the following command.
pip install -r requirements.txt
Workflow
- In this workflow demo, I have firstly created a Resource group named “RG_Demo_ADLS_Data_Protection” and further created a Storage account named “demo00blobsnapshot”.
Note: Remember to whitelist your IP in the “Networking” config settings of the storage account. Also, in the “Access Control (IAM)” config settings, add proper “role assignment” to yourself, especially ‘Storage Blob Data Owner/Contributor’ role for successful execution of this demo workflow.
2. The script below demonstrates the usage of Python SDK for Azure for implementing the above said workflow i.e., enabling blob snapshot on ADLS storage-account and creating blob snapshot for each write operation on the base blob
3. Before running the script, in the terminal of the IDE do the following steps:
- Log in to your Azure account
az login --tenant <tenant_id>
- Select the correct subscription
az account set --subscription <sub_id/sub_name>
[Info]: Now, the “_get_credential” method using “DefaultAzureCredential” library can do the authentication properly.
- After selecting the correct ‘Python Interpreter’ & correct ‘Configuration’ for the scope of your project like “Working Directory” etc. , run the script “blob_snapshot.py”.
- Following is the Python run-console with the workflow logs, please observed the highlighted text below.
4. After the script is successfully executed, we can observe on the Azure portal side blob snapshot i.e., the container named ‘container-blob-snapshot’ with a directory named ‘test_00’ with the base blob ‘snapshot_blob.txt’ has been created
Here in the image below different snapshots of the blob ‘snapshot_blob.txt’ can be seen, thus validating the current workflow.
Note:
- This feature is currently in preview and its stable version will soon be in General-Availability (GA)
- Also with Hierarchical Namespace (HNS) enabled on ADLS Gen2 storage account, blob versioning feature is not supported so blob snapshot offers a work-around to capture historical-data trends on the base blob
- Avoid calling methods that overwrite the entire blob, and instead update individual blocks to keep costs low.
Key Observation from the Workflow:
- With each ‘write operation’ on the same base blob, a new snapshot of the blob w.r.t that write operation of the blob is created
- This can help to retrieve the older version or some specific version of a blob via snapshot, if there is a need
- To overwrite a base blob with one of its snapshots, you would simply use
start_copy_from_url
method on blob client and provide the URL of the required bolb-snapshot to it. URL of the blob-snapshot will be same as that of the base blob but with ‘DateTime’ stamp appended as suffix.