Snowflake Cost-Saving Tips

Matt Weingarten
3 min readJan 11, 2024

--

This is actually how you pay Snowflake

Note: This is a post written in collaboration with the people at Hevo Data. Definitely check out their platform if you’re interested!

Introduction

In previous posts on this FinOps series, I’ve covered AWS and Databricks and various ways in which engineers can save on costs while satisfying all of their use cases. It’s only natural that now, we look into Snowflake as well.

Cluster Keys

Snowflake doesn’t have a standard partition concept like you might see with S3 files or in a standard SQL table. The data is instead batched into micro-partitions. While this works just fine on smaller tables, speed definitely becomes an issue when those tables grow in size. That’s where cluster keys come into play.

Defining cluster keys for large tables will almost certainly make querying much faster (so long as you’re actually using those cluster keys in your filters, but you should always be using partition columns in filters as a best practice). Clustering helps prevent large table scans, and while not every table is meant to have cluster keys (since there is extra cost involved in clustering), it certainly has its benefits.

We faced this issue a while back on one of our main tables. Queries were taking hours to complete when they should in theory have been much faster. We added a date cluster key (as date is one of our main filters) and the resulting queries sped up dramatically. You should be able to achieve similar benefits if you’ve encountered this issue before as well.

Warehouses

The easiest way to think about warehouses in Snowflakes is to draw a parallel to clusters in Databricks or any other compute service. Warehouses come in different sizes and can be shared by many different users depending on permissions. The impact of this becomes very noticeable as more and more use cases are supported.

As a best practice, it’s worth having a separate job warehouse from your “ad-hoc” warehouse that you use for everything else. That way, your performance isn’t hindered from large production loads. Likewise, make sure that you have a proper size in place for your warehouses. Smaller warehouses will end up in longer queries, but you also don’t want to fall into the trap of having a super large warehouse for no reason. Figure out what works best and only change it when necessary.

Furthermore, each team should have their own warehouse, unless there’s a reason for that not to be the case. Yes, managing more warehouses leads to more overhead, but too few warehouses leads to a lot of resource contention, or you wind up having a gigantic warehouse to make sure everyone can do what they need. Spread the love!

Miscellaneous

When it comes to data storage, S3 gives you the luxury of having lifecycle rules so you can automate optimizing your storage costs. Snowflake doesn’t quite have something like that, and storage costs can quickly accumulate as a result. Especially in lower environments, it’s worth having an automated process that cleans up older data. This should be easy to do via a stored procedure that takes into account an insertion timestamp column and a duration threshold.

Just like Databricks clusters, you don’t want to have your warehouses stay up too long when they’re not in use. You can set an auto-suspend time for your warehouse (and lower is almost always better in these cases) so you don’t have to worry about that.

Conclusion

We’ve covered three of the major data engineering tools and platforms in recent posts. Next, it’ll be time to look back at FinOps as a whole and suggest best practices for organizations looking to get started on their FinOps journey.

--

--

Matt Weingarten

Currently a Data Engineer at Disney Streaming Services. Previously at Meta and Nielsen. Bridge player and sports fan. Thoughts are my own.