Cost Control Series Part 2: Snowflake Best Practices

Tips and Best Practices for Saving Snowflake's costs

Eylon Steiner
Infostrux Engineering Blog

--

Photo by micheile henderson on Unsplash

In the previous post (Part 1), we explored the calculation of Snowflake costs, monitoring techniques, and the implementation of resource alerts or suspensions when exceeding specified limits.

This post (Part 2) will discuss best practices for minimizing these costs.

A person amasses heaps of money
Photo by Towfiqu barbhuiya on Unsplash

Best Practices

To effectively control your costs in Snowflake, it is crucial to follow best practices. Here are some key recommendations:

Knowledge Transfer

Ensure that your team has a clear understanding of Snowflake’s cost model. They need to comprehend how storage and warehouse usage impact billing. This knowledge will enable them to make informed decisions and optimize resource utilization.

Roles and Responsibilities

Designate specific team members who will be responsible for managing the Snowflake infrastructure. This includes creating warehouses, setting up resource monitors, and monitoring usage. Limit the permissions to create and modify these resources to only these designated engineers. This ensures better control and accountability over cost-related activities.

Warehouses

Size

Start with a small WH, monitor the usage, and adjust accordingly.
Small doesn't always mean cheaper. It might be that the next level will take half of the time so that the cost will be the same. The way to determine it is to look closely at the usage pattern. Are there multiple queries running in parallel? What are the requirements for the query’s performance? Do you have a dashboard that gets results quickly is essential?

Auto_suspend

When managing warehouses in Snowflake, it’s important to consider the time it takes to bring up a warehouse and the fact that the warehouse cache is cleared. Additionally, the auto_suspend setting should be evaluated based on the usage pattern.

Typically, setting the auto_suspend to one minute is a common practice that works well in many scenarios. However, if the usage pattern indicates frequent activity occurring every 5 minutes, it’s advisable to reevaluate the auto_suspend setting.

Remember the trade-off between minimizing idle resource usage and the time it takes to spin up a warehouse. Adjusting the auto_suspend setting accordingly helps strike the right balance based on the specific usage pattern and requirements.

Resource monitor

Have a resource monitor per WH and for the account.

# Find if there is a WWH without a resource monitor
SELECT "name" AS WAREHOUSE_NAME,"size" AS WAREHOUSE_SIZE
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "resource_monitor" = 'null'

Have many warehouses as needed

Our strong recommendation is to have a dedicated warehouse (WH) for each specific work type and environment. This approach offers several benefits:

1. Isolation and Performance: Having dedicated warehouses ensures that workloads are isolated from each other. This prevents interference between different work types and environments, leading to better performance and stability. Each workload can utilize the allocated resources optimally without being affected by others.

2. Resource Allocation: Dedicated warehouses allow for precise resource allocation based on the specific requirements of each work type and environment. This enables you to scale resources up or down accordingly, ensuring optimal performance and cost-efficiency for each workload.

3. Simplified Management: With dedicated warehouses, it becomes easier to monitor and manage individual workloads. You can track usage, monitor performance, and troubleshoot issues more effectively, as each warehouse is dedicated to a specific work type and environment.

For example

Dev_wh_ingest

Dev_wh_transform

Dev_wh_data_analysts

Prod_wh_ingest

Prod_wh_transform

Prod_wh_data_analysts

By adopting the above approach, you can achieve the following advantages:

1. Access Control: Granting developers access only to the dedicated development warehouse (WH) and databases ensures a controlled and secure environment. It restricts access to sensitive production data, promoting data privacy and compliance.

2. Controlled Ingestion: Assigning the ingestion tool exclusive access to the dedicated ingestion warehouse facilitates efficient data ingestion. It ensures that ingestion processes are isolated, preventing interference with other workloads and optimizing performance.

3. Efficient WH Cache Usage: By separating workloads into dedicated warehouses, the warehouse cache can be utilized more effectively. The cache can store frequently accessed data specific to each workload, resulting in improved query performance and reduced latency.

4. Cost Control and Resource Monitoring: Managing costs becomes easier with dedicated warehouses as you can assign specific Resource Monitors to each task. This allows you to closely monitor and control resource consumption, ensuring efficient resource utilization and cost management.

5. Customized Settings: Each dedicated warehouse can have different permissions, sizes, and auto-suspend configurations based on the workload requirements. This flexibility allows for tailored resource allocation and optimized performance for each specific task or environment.

Auto scaling

To optimize the loading process over time, leveraging Snowflake’s auto-scaling feature is highly beneficial. This feature automatically scales the cluster up or down based on the workload’s demand. By dynamically adjusting the warehouse (WH) size, you can effectively control costs and resource utilization.

To manage the number of clusters, you can modify the value of MAX_CLUSTER_COUNT. This allows you to set an upper limit on the maximum number of clusters that can be spawned. By limiting the number of clusters, you maintain better cost control and prevent excessive resource allocation.

If query queuing is acceptable for your project, it is recommended to set the SCALING_POLICY to ECONOMY. This configuration prioritizes maximizing the utilization of existing clusters rather than starting additional clusters. This approach helps conserve credits by keeping running clusters fully loaded, thus reducing unnecessary resource allocation.

Queries

Follow SQL best practices for queries optimization

Be aware of Warehouse and query cache — e.g.: running the same query will not cost money as the result is cached for 24 hours

Be aware of Warehouse Metadata — e.g.: getting count of records in a table is not going to cost money as the metadata is stored for each table

Terminate long executing queries

Limit query time -

You can limit the queries’ run time in different levels by changing the parameter STATEMENT_TIMEOUT_IN_SECONDS

This will eliminate the possibility of a wrong query taking hours to run. We suggest having a limit in the ACCOUNT level and a different limit per WH according to usage.

SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN ACCOUNT;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN USER <username>;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN SESSION;
SHOW PARAMETERS LIKE 'STATEMENT_TIMEOUT_IN_SECONDS' IN WAREHOUSE <warehouse_name>;
# Use ALTER to change
alter warehouse sales_wh set
statement_timeout_in_seconds = 7200; = 2 hours

Limit Statement Queue Times

For instance, there are cases where queries become irrelevant after one hour. In such scenarios, it is recommended to leverage Snowflake’s automatic suspension feature.
STATEMENT_TIMEOUT_IN_SECONDS

Scheduling

Consider scheduling your tasks based on cache and usage to optimize performance. For example, instead of running tasks on orders tables and sales tables simultaneously, schedule them at different times (e.g., orders tables at 1:00 and sales tables at 2:00). This helps prevent overloading the warehouse and frequent cache changes, resulting in improved efficiency.

However, when the warehouse is underutilized, it makes sense to schedule multiple tasks to run in parallel. This takes advantage of available resources and maximizes throughput.

3rd party tools

There are a few 3rd party tools that are using Snowflake API and metrics to monitor your resources, alert and suggest optimizations. Here are two examples:

https://keebo.ai/snowflake-optimization — automatically optimize your Snowflake account

https://www.datadoghq.com/dg/monitor/snowflake — monitor usage

One of the advantages of using these tools like this, is that you can monitor all of your tools (like Snowflake, AWS, Azure, etc.) in one place.

Optimizations

Use Snowflake documentation to discover compute usage that might need fine-tuning:

Get detailed information for query optimization

Query History

Query Profile

Inspect warehouse load to optimize warehouse performance

- The following query can help you to understand the warehouses credits consumption pattern for the last week
select * from table(information_schema.warehouse_metering_history(dateadd('day',-7,current_timestamp()),current_timestamp()));
- The following queries can help you understand a warehouse load pattern
use warehouse <YOUR HW>;
select * from table(information_schema.warehouse_load_history(date_range_start=>dateadd('day',-7,current_timestamp()))); - where avg_running != 0;
select * from table(information_schema.warehouse_load_history(date_range_start=>dateadd('day',-7,current_timestamp()))); - where avg_queued_load != 0;
select * from table(information_schema.warehouse_load_history(date_range_start=>dateadd('day',-7,current_timestamp()))); - where avg_queued_provisioning != 0;
select * from table(information_schema.warehouse_load_history(date_range_start=>dateadd('day',-7,current_timestamp()))); - where avg_blocked != 0;

If you have are ACCOUNTADMIN, you can also go and see the usage in the Admin UI:

Quicksight Admin UI — Warehouse Activity and Query History

You can inspect the AVG_RUNNING_TIME by running the following query. Make sure you choose the CHART to see a graphic result, and change the data axis to be the AVG.

select date_trunc('hour', start_time), hour(start_time), avg(avg_running)
from table(information_schema.warehouse_load_history(date_range_start=>dateadd('day',-10,current_timestamp())))
group by date_trunc('hour', start_time), hour(start_time)
order by date_trunc('hour', start_time) asc;

Warehouse Load

Inspect warehouse events to identify room to optimize.

WAREHOUSE_EVENTS_HISTORY View

For a list of queries that help identify usage that might benefit from optimizations, see Snowflake’s Resource Optimization quickstart guides.

Conclusion

Now that you have a clear understanding of how Snowflake will bill your account, how to monitor usage and receive timely alerts effectively, and the best practices to optimize costs, it is crucial to disseminate this knowledge among your team.

Encourage open communication and discussions within your team to address any questions or concerns about Snowflake usage and cost management.

Consider organizing training sessions or workshops to educate team members further on Snowflake’s cost management and best practices. Empower them to take ownership of their data usage and understand how it influences the overall budget. or hire professionals like Infostrux Solutions with offers targeting FinOps in Snowflake.

I’m Eylon Steiner, Engineering Manager for Infostrux Solutions. You can follow me on LinkedIn.

Subscribe to Infostrux Medium Blog at https://blog.infostrux.com for the most interesting Data Engineering and Snowflake news. Follow Infostrux’s open-source efforts through GitHub.

--

--