Snowflake Cost Management Overview
Are you thinking about using Snowflake? Are you already running your first POC and you have worries about costs? Or you still ignore Snowflake because the cost model is „different“ compared to traditional solutions and it might look complicated or unclear? If you find yourself in any of those questions then keep reading. We will try to explain basics about Snowflake cost model and demystify all possible unclarities. Let’s dive in and see how it works, what it drives and how to keep it under control.
Snowflake Data Cloud is Software as a Service (SaaS) and as such it has various cost metrics. They are different from those which you might know from traditional or legacy data providers. As a SaaS solution there is no license to buy or hardware to select. All you need to do is just sign-up for your account. Snowflake has pay as you go model and you only pay for what you really use. It utilizes the per second billing. Processing time is charged on that basis. Snowflake has its own invoiceable unit called Snowflake credit. Its price varies based on your Snowflake edition, cloud provider (AWS, Azure, GCP) and the cloud region where your account is provisioned. Snowflake is cloud agnostic solution and thus it is available on all major cloud providers, giving customers the opportunity to select the provider based on their needs. Thanks to its global presence it offers features which significantly supports availability, disaster recovery or simply global availability. Those features are global replication, cross region and cross cloud data sharing, etc. You can find more about regions here: https://docs.snowflake.com/en/user-guide/intro-regions.html
Snowflake offers 4 editions: Standard, Enterprise, Business Critical & Virtual Private Snowflake. Each of them differs in available features and also price for Snowflake credits. Details about each and every edition are available here: https://docs.snowflake.com/en/user-guide/intro-editions.html
Cost Drivers Monthly & Yearly
There are a few different capabilities and services that are driving the Snowflake consumption based pricing models. In the coming section we will identify and explain the different services and capabilities adding up to the monthly or yearly billing from Snowflake. Hopefully, this will make it easier to understand, monitor and also control the cost drivers using Snowflake.
Virtual Warehouse (Compute)
The main cost driver is the virtual warehouses. It is with these virtual warehouses a user executes their queries to get the results. Snowflake supports a range of predefined virtual warehouse sizes: X-Small, Small, Medium, Large, X-Large, 2X-Large, 3X-Large, 4X-Large, 5XL and 6XL. These sizes are often referred to as T-shirt sizes and determine how much compute power you get for a virtual warehouse. Snowflake does also support a cluster of virtual warehouses with up to 10 running virtual warehouses in the same cluster. When a virtual warehouse or a multi-cluster virtual warehouse is not running (when it is suspended or dropped), it does not consume any Snowflake credits. The credit consumption for the different virtual warehouses is dependent on the size of the warehouse and they are billed by the second with a one-minute minimum.
- Virtual Warehouses are sized like T Shirts
- XS usage in 1 hour = 1 credit
- Charged by second after the first minute
- Instant resizing
- Auto Suspend = No charge
The way Snowflake charges for cloud storage is by calculating the average amount of terabyte storage used per month, this calculation is only made based on the compressed and ingested data into Snowflake. The compression rates can vary depending on the file types ingested. All data ingested into Snowflake will always be compressed, encrypted and columnrised. The Cloud storage services also make it easy to take advantage of data redundancy and replication, across multiple data centers
- Compressed (x5-x10)
- Replicated on 3 Data Centers (availability zones)
The serverless compute model for tasks enables you to rely on compute resources managed by Snowflake instead of user-managed virtual warehouses. The compute resources are automatically resized and scaled up or down by Snowflake as required for each workload. Snowflake determines the ideal size of the compute resources for a given run based on a dynamic analysis of statistics for the most recent previous runs of the same task.
Cloud Services Layer
Cloud services layer works as the control unit of your Snowflake account, all the requests regardless if they come from the UI, CLI or any connector will go via the Cloud Service Layer. This layer is also monitored by the Snowflake Engineering and Support teams. The resources for the Cloud services layer are automatically assigned by Snowflake based on the requirements of the workload and are not being managed by the customers. The utilization of cloud services is free up to 10% of daily computer credits, which means most customers will not see incremental charges for cloud services usage.
Customers also have access to additional services through the Cloud Services and Serverless layer. The easiest way to determine whether the cloud services layer is being used for a particular query is to reassure that the executed query is not assigned to a virtual warehouse. Examples of tasks that use the cloud services layer are Authentication, Infrastructure management, Metadata management, Query parsing and optimization, Access control, SHOW commands, and returning results from the query cache.
This allowance is usually more than enough to cover the entire cloud services bill for most customers, only about a quarter of accounts ever need to pay extra.
The total cost of the Snowflake service will be determined on the usage of the resources, this may change more or less depending on the organization’s needs for a specific situation from time to time. Therefore, it is not always accurate to determine the total cost beforehand, but with tools and experience Snowflake customers can get a close understanding on what the monthly costs would be. Many Snowflake customers gain this insight by using Snowflake On Demand accounts which allows them to develop, test and provide real-world experience to understand their workloads. With this insight the customers together with the Snowflake Sales Teams or Snowflake Partners will get an understanding of the monthly cost and can accurately do a Capacity purchase.
Cost monitoring is a crucial part of managing your Snowflake account. If you want to spend your credits wisely and have either basic or detailed overview about your spendings you should invest some time into a solution which will be monitoring your costs and it can also offer you the life saving break in the sense of shutting down the warehouse which will be out of your desired limits.
Snowflake offers several features which can help you with controlling your costs. Resource monitors are one of them.
Resource monitors allow you to define threshold for snowflake credits consumption. When a limit is reached or approaching the resource monitor can trigger various actions like sending a notification or even shut down the warehouse (only for user managed warehouses). Credit consumption limits can be specified on different levels — for whole accounts or particular warehouses (single one or specific set of them).
Data governance is another crucial part which can help you control your costs. You need to understand how your data is being used in order to understand the cost behind. Account usage is schema in the shared SNOWFLAKE database which allows you to query metadata related to your account. There you can find various metadata related to warehouse performance and credit usage, same as metrics related to user login or data storage usage. Based on available data in account_usage schema you can create a cost monitoring dashboard which will visualize your important KPIs in easily readable and understandable format of visual charts. You can build it in your favorite visualization tool like Tableau or Qlik. But of course you can also use the new Snowsight and build charts like below!
Access history is one of the Snowflake internal views available in ACCOUNT_USAGE schema. It helps you track the SQL statements which have been performed in your account. It can help you with following tasks:
- data discovery and following decisions if certain data objects can be dropped or are still in use.
- sensitive data movement tracking
- data governance
- data lineage
access_history tracks both read and write operations which means it is possible to track how data moves between tables and reconstruct the whole data lineage.
Speaking about data governance features we can’t forget about object tagging. You can assign tags to various Snowflake objects (tables, views, columns, etc.). This could be an easy way how you can identify sensitive (PII) data and link particular action to it based on tag value. Or you can introduce visibility into resource usage and based on tag value determine which resource consumes the most Snowflake credits.
How to have your costs under control?
If you are an existing Snowflake customer and you are thinking how to optimize your Snowflake credit usage you can check my another posts where I am trying to provide recommendations how to optimize your Snowflake costs for compute and storage.
First part is focused on compute cost optimization:
Snowflake cost optimization: part I.
Blog post offering tips for saving Snowflake credits for compute resources and thus lower the overall cost.
And the second part is focused on storage optimization and some basic performance tuning tips: