Best Practices to optimize Resource Optimization in Snowflake

Snowflake has a pricing model based on usage that is simple to comprehend: the pricing is transparent and customers only pay for the resources they actually consume.

Modern enterprises receive data from diverse sources in various formats, and the amount of data flowing into their data warehouses or data lakes can fluctuate unexpectedly. As companies expand and add layers to their operations, verticals, and customer-facing applications, their data pipelines become more complex and challenging to manage. The value of data lies in the insights it provides for better decision-making. Therefore, the entire data pipeline must be monitored precisely to achieve the best results. As every query is billed to the customer, any disruptions in the data pipelines or ETL of incorrect data must be addressed and rectified immediately to maximize the return on investment in Snowflake. By utilizing data observability tools to monitor and optimize their data systems consistently, businesses can prevent wastage, regardless of whether they pay on demand or pre-purchase capacity.

Thanks to Snowflake, customers have access to a virtually unlimited set of resources that can be activated almost instantly but also automatically resized or turned off completely when no longer needed. This instant elasticity provides the flexibility needed to match resources to the exact needs of every user, team, department, and workload every second of the day. This also applies to storage, computation, and serverless activities; each resource can be scaled independently. The result is that Snowflake customers only pay for the resources they need when they need them, maximizing efficiency and minimizing waste and costs.

Here are some of the best practices identified by David A. Spezia, Principal Sales Engineer at Snowflake, and that can also be consulted on the Snowflake Resource Optimization Quickstarts.

Enabling Auto-Resume

Make sure that all virtual warehouses have the Auto-Resume feature enabled. If you intend to implement automatic suspension and set appropriate timeout limits, enabling Auto-Resume is essential; otherwise, users will not be able to query the system.

You can use the following query to identify warehouses that do not have Auto-Resume set:

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_resume" = 'false'
;

By enabling this feature, a warehouse is automatically resumed every time a query is submitted to that specific warehouse.

By default, all warehouses have Auto-Resume enabled.

Enabling Auto-Suspend

Make sure that all virtual warehouses are set to Auto-Suspend mode. This way, when they have finished processing queries, auto-suspend will switch off the virtual warehouses, thus stopping credit consumption.

The following query can be used to identify all warehouses that do not have auto-suspend enabled.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE IFNULL("auto_suspend",0) = 0
;

By default, all virtual warehouses have automatic suspension enabled.

Setting an appropriate timeout for different workloads

All virtual warehouses must have an appropriate timeout for their particular workload.

The following query identifies the warehouses that have the longest setting for automatic suspension after a period of inactivity on that warehouse:

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_suspend" >= 3600// 3600 seconds = 1 hour
;

Using the Resource Monitor

The Resource Monitor allows you to monitor and manage resource utilization within Snowflake.

It provides detailed information on the resources used, such as the amount of storage used, the number of running warehouses, and the use of credits. In addition, alerts and limits on resource utilization can be set to maintain control over costs and performance.

In particular, the Resource Monitor is an important tool for system administrators, IT managers and end users of Snowflake who wish to monitor and optimize resource utilization.

To access the Resource Monitor, click on the Admin, Resource Monitors tab:

and from there, click on the blue button in the top right-hand corner:

Using the following query, it is possible to identify all warehouses that do not have the Resource Monitor.

SHOW WAREHOUSES
;
SELECT "name" AS WAREHOUSE_NAME
,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "resource_monitor" = 'null'
;

In summary, therefore, resource monitoring allows limits to be set on the credits consumed by a warehouse during a specific time interval. In this way, it can be avoided that some warehouses unintentionally consume more credits than planned.

User Segmentation

It may be useful to segment roles within Snowflake. Indeed, if the execution times or query counts between roles within a single warehouse are extremely different, it may be worthwhile to segment users into separate warehouses and configure each warehouse to meet the specific needs of each workload.

SELECT *
FROM (
SELECT
WAREHOUSE_NAME
,ROLE_NAME
,AVG(EXECUTION_TIME) as AVERAGE_EXECUTION_TIME
,COUNT(QUERY_ID) as COUNT_OF_QUERIES
,COUNT(ROLE_NAME) OVER(PARTITION BY WAREHOUSE_NAME) AS ROLES_PER_WAREHOUSE
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
where to_date(start_time) >= dateadd(month,-1,CURRENT_TIMESTAMP())
group by 1,2
) A
WHERE A.ROLES_PER_WAREHOUSE > 1
order by 5 DESC,1,2
;

Setting Account Statement Timeouts

Another good practice is to set and customize timeout statements for warehouses, accounts, sessions and users, according to the data strategy for long-running queries.

You can use the STATEMENT_QUEUED_TIMEOUT_IN_SECONDS and STATEMENT_TIMEOUT_IN_SECONDS parameters to automatically interrupt queries that take too long to execute, due to a user error or a blocked cluster.

Below is an example of the different parameters:

SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN ACCOUNT;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN WAREHOUSE <warehouse-name>;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN USER <username>;

Identify Warehouses that vary from the 7-day average

When engaged in long-term projects, it can be difficult to keep track of credit utilization data, especially in cases where credit exceeds the desired amount. A useful method is to create a query that checks the average of seven days of credit utilization and finds out the weeks that exceed the utilization. This method can provide you with useful indications of activities that take up too much space and time but are not performed efficiently.

You can run this query every morning to check the warehouses that vary from the average of seven days of credit utilisation:

SELECT WAREHOUSE_NAME, DATE(START_TIME) AS DATE, 
SUM(CREDITS_USED) AS CREDITS_USED,
AVG(SUM(CREDITS_USED)) OVER (PARTITION BY WAREHOUSE_NAME ORDER BY DATE ROWS 7 PRECEDING) AS CREDITS_USED_7_DAY_AVG,
(TO_NUMERIC(SUM(CREDITS_USED)/CREDITS_USED_7_DAY_AVG*100,10,2)-100)::STRING || '%' AS VARIANCE_TO_7_DAY_AVERAGE
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY"
GROUP BY DATE, WAREHOUSE_NAME
ORDER BY DATE DESC;

Monitoring warehouses approaching the billing threshold for cloud services

The following query examines warehouses where cloud service costs represent a high percentage of the workload.

In general for an account, Snowflake will only charge cloud services if they exceed 10 percent of the daily credit consumption of the virtual warehouse. Cloud services activities are useful for metadata operations, such as BI tool discovery queries, heartbeat queries, SHOW commands, cache usage and many other service optimization functions.

This query helps to understand which warehouses are approaching or exceeding the 10 percent threshold:

WITH
cloudServices AS (SELECT
WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH,
SUM(CREDITS_USED_CLOUD_SERVICES) AS CLOUD_SERVICES_CREDITS,
COUNT(*) AS NO_QUERYS
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
GROUP BY WAREHOUSE_NAME,MONTH
ORDER BY WAREHOUSE_NAME,NO_QUERYS DESC),
warehouseMetering AS (SELECT
WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH,
SUM(CREDITS_USED) AS CREDITS_FOR_MONTH
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY"
GROUP BY WAREHOUSE_NAME,MONTH
ORDER BY WAREHOUSE_NAME,CREDITS_FOR_MONTH DESC)
SELECT *, TO_NUMERIC(CLOUD_SERVICES_CREDITS/NULLIF(CREDITS_FOR_MONTH,0)*100,10,2) AS PERCT_CLOUD
FROM cloudServices
JOIN warehouseMetering USING(WAREHOUSE_NAME,MONTH)
ORDER BY PERCT_CLOUD DESC;

Delete unused tables

It is not uncommon for there to be unused tables. It is good practice to delete these tables and make sure that no user executes queries on them.

A further good practice would be to make it mandatory to check all tables before deleting them.

Using this SQL script:

SHOW STREAMS;

select *
from table(result_scan(last_query_id()))
where "stale" = true;

it is possible to identify unused tables.

Delete ‘inactive’ users/roles or who have never logged in

It is a good idea to delete inactive users or users who have never logged into Snowflake from their account.

  • Delete inactive users
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS
WHERE LAST_SUCCESS_LOGIN < DATEADD(month, -1, CURRENT_TIMESTAMP())
AND DELETED_ON IS NULL;
  • Users that never logged in
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS
WHERE LAST_SUCCESS_LOGIN IS NULL;
  • Inactive Warehouse
SELECT R.*
FROM SNOWFLAKE.ACCOUNT_USAGE.ROLES R
LEFT JOIN (
SELECT DISTINCT
ROLE_NAME
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME > DATEADD(month,-1,CURRENT_TIMESTAMP())
) Q
ON Q.ROLE_NAME = R.NAME
WHERE Q.ROLE_NAME IS NULL
and DELETED_ON IS NULL;

Use a BI partner dashboard for the purpose of tracking Snowflake usage and expenses

By utilizing the business intelligence (BI) and analytics partners' tools, you will have the ability to access a selection of pre-made dashboards that can be easily implemented to keep track of your Snowflake usage. Below is a list of some of the currently accessible prebuilt dashboards:

Conclusion

Due to Snowflake’s high elasticity of calculation and per-second billing model, account administrators should constantly monitor resource utilization, growth and efficiency to ensure they match performance requirements and budget.

For companies seeking to enhance their cost governance, it is advisable to analyze their usage patterns and detect any inadequacies or inefficiencies. Snowflake’s products can offer significant benefits in this regard, such as automating usage monitoring and setting up triggers to maintain manageable costs as usage increases.

To begin examining your own company’s usage patterns, it is recommended to explore the Resource Optimization Quick Start Guide:

--

--