Storage Tiers in Azure

Consider all the puzzle pieces when changing Azure Storage Tiers

Christian Gert Hansen
Backstage Stories
Published in
4 min readDec 15, 2023

--

Photo by Mel Poole on Unsplash

Introduction:

The post aims to provide insights into my own day-to-day life as a Databricks Data Platform consultant. Let me take you behind the scenes at one of our customers.

This story is about when I was tasked to change Storage Tiers in Azure for a Data Backup.

Topics to be covered:

  1. How to optimize data storage tiers in Azure for cost-efficiency.
  2. Factors to consider when transitioning data between storage tiers.
  3. Cost analysis for different storage options.
  4. Insights into implementing storage tier changes using Azure.
  5. The importance of patience during data migration.

As my team and I wrapped up our routine second-weekly sprint planning session, I could not help but think that my upcoming user story would be a breeze. However, as any seasoned developer knows, appearances can be deceiving, and challenges often lie beneath the surface.

My user story was titled “Change Storage Tiers For Data Backup”. To provide some context, our team was in the process of decommissioning a platform within Azure. And instead of wiping everything out, we decided to take a cautious approach and retain the production data for a brief period. After all, it is better to be safe than sorry.

The primary goal in altering our storage tiers was twofold: to optimize costs while still ensuring accessibility of our backup data. With approximately 500 terabytes of data classified as “Hot,” it became clear that a deeper dive into this task was warranted.

The Strategy

In the quest to optimize our data storage with respect to cost, I took a deep dive into the aspects of Azure Storage Tiers, carefully considering several key variables:

  1. Backup Period and Early Deletion Penalty
  2. Cost of Storing Data
  3. Cost of Changing Storage Tiers

The first critical factor was to consider the Azure Storage Tier deletion penalties. I had a 70-day backup requirement for the data. This meant, I had to compare this period with the penalties for the different Storage Tiers; “Cool” storage has a minimum retention period of 30 days, while “Cold” and “Archive” extends to respectively 90 and 180 days.

I quickly deemed “Archive” as impractical, primarily due to the substantial 110-day deletion penalty and the complexity of rehydration if the backup data was needed.

This left me with two viable options: “Cool” or “Cold.” The decision hinged on whether we should endure the 20-day early deletion penalty of moving data to “Cold” or remain in the “Cool” tier for the entire 70 days.

The next crucial step was calculating the costs associated with storing data in these different storage tiers. I strongly recommend using the Azure Calculator for precise calculations.

With today’s Azure prices, here is a breakdown of the cost:

  • 500 TB in “Hot” for 70 days: $9,666.56 / 30 * 70 = $22,555.00
  • 500 TB in “Cool” for 70 days: $5,120.00 / 30 * 70 = $11,947.00
  • 500 TB in “Cold” for 70 days + 20 days early deletion penalty: $2,304.00 / 30 * 90 = $6,912.00

Based on the analysis, it appeared logical to transition our data to the “Cold” tier, even while accepting the early deletion penalty.

Do not forget the cost of changing tiers

A crucial factor still remained to be considered — the cost of moving data from one storage tier to another.

Cost of Tiering Down:

(Number of write operations / 10,000) * Cost of a write operations

For both “Cool” and “Cold,” the “Cost of Write Operations” was found to be $0.13 and $0.234, respectively.

To estimate the total number of write operations, I used the total blob count as a proxy, assuming all blobs were in the “Hot” tier. The blob count estimation was obtained using the feature: Storage Account > Metrics > Blob Count. This resulted in approximately 600 million blobs.

The cost of moving data between storage tiers was calculated as follows:

  • Hot to Cool: (600,000,000 / 10,000) * $0.13 = $7,800
  • Hot to Cold: (600,000,000 / 10,000) * $0.234 = $14,040

Finally, let’s consider the total cost, factoring in both data storage and changing storage tiers:

  • Keeping it all in “Hot”: $22,555.00
  • Moving to “Cool”: $11,947.00 + $7,800 = $19,747
  • Moving to “Cold”: $6,912.00 + $14,040 = $20,952

This cost breakdown led us to the informed decision to move our data to the “Cool” tier.

The implementation

The actual implementation of this transition was relatively straightforward. I leveraged the Storage Account Lifecycle Management to create a simple rule moving all data created more than one day ago from the “Hot” Storage Tier to our chosen destination.

However, there was a moment of initial confusion. Upon enabling the rule, it appeared as though nothing had changed. To my surprise, all our data stubbornly clung to the “Hot” Storage Tier. When I inspected the Storage Account Insights, it appeared as if nothing was happening at all. It was a perplexing situation, but there was a crucial detail that I had overlooked.

As it turned out, it can take up to 24 hours before a Lifecycle Management Rule becomes active and the first execution of the rule commences. So, with patience as my ally, I decided to wait it out.

Returning to the task after 24 hours, I was greeted with a sight that reassured me that things were moving in the right direction. The blob transactions on the storage account had exploded, with millions of transactions taking place. It took approximately eight days for all the data to complete its migration from the “Hot” Storage Tier to the “Cool” Storage Tier.

To sum it all up

  1. We do not want to archive data for short-term backup periods due to early deletion penalties and data rehydration.
  2. Remember to calculate both the cost of storing data and the cost of moving data!
  3. Be patient when implementing the Life Cycle Management rules.

I closed the user story and moved on to the next.

--

--

Christian Gert Hansen
Backstage Stories

I'm a dedicated data professional with a passion for crafting extraordinary data solutions that are driven by real business value.