Optimizing Costs in Snowflake with Effective Storage Lifecycle Policies

In the era of big data, there has been an enormous amount of data accumulated over the period. Managing storage costs is a significant challenge for businesses given not sure which data to purge. Implementing a robust storage lifecycle policy can be a game-changer, allowing organizations to optimize storage usage and reduce costs by automatically deleting data that is no longer needed.

A storage lifecycle policy is a set of rules that manage the lifecycle of data from its creation to deletion. These policies help to automate the process of data management, ensuring that data is retained for the required period and then deleted or archived to optimize storage costs. This functionality is also referred to as information lifecycle management.

Snowflake has a new feature called the Storage lifecycle policy (private preview*) that allows you to implement policy and enable it in multiple tables.

Benefits of Storage Lifecycle Policies

  1. Cost Savings: By deleting unnecessary data, organizations can significantly reduce storage costs.
  2. Improved Performance: Removing obsolete data can enhance system performance and efficiency.
  3. Compliance and Risk Management: Automated data deletion ensures compliance with data retention policies and reduces the risk of data breaches. For example, regulatory compliances such as HIPAA and SOX in the United States are required to retain data for 6 & 7 years respectively.
  4. Simplified Management: Automation simplifies data management, reducing the need for manual intervention.

Implementing Storage Lifecycle Policies

  • Governance: Set up required privileges to use and manage storage lifecycle policies.
  • Identify Data Requirements: Understand the data requirements of your organization. Determine what data is critical and what can be deleted after a certain period.
  • Set Up Policies: Create policies that define retention periods and deletion criteria.
  • Automate: Enforce the policies on the desired tables which will execute once every 24 hours.
  • Monitor and Adjust: Regularly monitor the effectiveness of your policies.

Example of a Storage Lifecycle Policy in Action

Consider a company that manages customer orders. They might categorize orders into active and inactive categories. Active data is retained for five years, while inactive data will be deleted. By automating this process, the company can ensure compliance with data retention laws and reduce storage costs.

-- set the context

USE ROLE sysadmin;
USE demodb.storage;
USE demo_Wh;

--create sample data
ALTER warehouse demo_wh SET warehouse_size = '3x-large';

-- create sameple table
CREATE TABLE orders AS
SELECT
dateadd(YEAR, 26, o_orderdate) o_orderdate,
* exclude o_orderdate
FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.ORDERS
ORDER BY
o_orderdate DESC;

ALTER warehouse demo_wh SET warehouse_size = 'xsmall';

-- see how many rows in the table
SELECT count(*) FROM orders;


-- setup policy, it will create in the current schema
CREATE OR REPLACE storage lifecycle policy delete_by_date_policy
AS (datecol date)
RETURNS boolean ->
datecol < dateadd(DAY, -365 * 5, current_timestamp());

-- review polices
SHOW STORAGE lifecycle policies;
DESC STORAGE lifecycle policy delete_by_date_policy;

-- Automate it
ALTER TABLE demodb.storage.orders
ADD STORAGE lifecycle policy delete_by_date_policy ON (o_orderdate);

-- this will execute in 24 hours

-- Monitor
SELECT * FROM
TABLE(
DEMODB.INFORMATION_SCHEMA.POLICY_REFERENCES(
REF_ENTITY_NAME => 'demodb.storage.orders',
REF_ENTITY_DOMAIN => 'table'
));
SELECT * FROM
table(
DEMODB.INFORMATION_SCHEMA.
STORAGE_LIFECYCLE_POLICY_HISTORY
(POLICY_NAME => 'delete_by_date_policy')
);

-- test size of tables over period
SELECT count(*) FROM orders;

Conclusion

Storage lifecycle policies are essential for managing data efficiently and reducing storage costs. Snowflake understands the needs of customers as the data are growing and working to make it easier to manage storage to reduce cost. By implementing these policies, organizations can optimize their storage usage, ensure compliance, and improve overall data management.

*Snowflake Private Preview feature: It is not available in a trial account, if you are a customer please reach out to your account team, and it is subject to availability.

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

--

--