Mythbusting Snowflake Pricing! All the cool stuff you get with 1 credit

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

16 min readJan 11, 2022

If you’re looking for the best in cloud-based data management, there’s a reason why enterprises around the world choose Snowflake. We pioneered the consumption-based pricing model when it comes to data platforms, and this has been a huge hit with our customers! What’s more, we have many partners & competitors who’ve made the move to offer a similar pricing model using their own units of measure.

So if it was such a success why the rumors around Snowflake being an expensive solution? The first question you should ask is the source of these statements. Is this coming from directly from a customer or somewhere else? If you do ask around the customer base, what you would quickly find out is that rarely any customer thinks Snowflake is expensive once they start using it and seeing all the benefits. This begs the question then if not customers are not complaining then who?

So the goal of this article is to separate facts from fiction as well as to show the real cost of complexity when it comes to cheaper alternatives. If you are used to buying on-prem hardware & software or even pre-paying for cloud resources, consumption-based usage can be a little challenging to understand.

Snowflake pricing is different in the way that it is an on-demand cloud resource. The amount, scale & duration of resources you consume are entirely based on the day-to-day needs of your business. It grows, shrinks & stops with your need for analytics.

It is very similar to electricity where most people don’t pre-buy electricity for a year or more by trying to calculate what they may use for the next 12–36 months but instead get a monthly bill of exactly what they used based on a meter. Yet, pre-buying a fixed amount is how most IT departments are used to buying services and resources in the cloud. This just doesn’t make sense especially for such a fluid business need like data & analytics where there is no way to predict the future needs in terms of volume, velocity & variety.

So what is Snowflake?

The first thing you need to remember is that Snowflake is a service. It is not software, not hardware nor storage but instead an automated, redundant & self-healing service you use to store and access vast amounts of data effortlessly via any cloud provider so you can focus on what really matters which is your business!

It is a service that literally removes all of the architectural complexities & individual resources associated with storing, managing, securing, accessing & sharing data in the cloud. You simply turn it on and you are ready to go much like Gmail or Salesforce.

What is a Snowflake credit?

The technical description for one Snowflake credit is 1 hour of usage by a single compute node. Each node is an 8-core cloud-based server that is fully managed by Snowflake and requires no customer maintenance.

Any time Snowflake accesses (read or write) data, the compute work is done via MPP clusters that are referred to as Virtual Warehouses. These clusters have different sizes and can have anywhere from 1 to 512 nodes(servers) based on your performance needs. Unique thing about Snowflake is customers can have any number of these clusters in various sizes talking to same identical copy of the data which makes workload management and resource contention issues a thing of the past. They just don’t exist in Snowflake.

As each node in a cluster consumes 1 credit per hour, the more nodes you have in a cluster, the more credits it will use in a given amount of time. However, the time you will need to finish the required work will be that much shorter due to more nodes. This means the cost will be mostly identical whether you use a smaller cluster & wait longer, or use a bigger cluster and do it much faster.

For those data engineers & data scientists, this is also a Duuuuh moment. Why would you wait longer for your jobs to finish if it costs the same to run them 2X, 4X, 8X , 16X…faster?

How is a single credit consumed?

Another interesting fact is that the credits are NOT charged hourly. They are billed per second where 1 credit is 3600 secs of usage by a single server node. Because these nodes are on-demand resources that can instantly auto-start and auto-stop based on your needs, it is the combination of secs you use throughout the day, that makes up a single credit. Basically, the total number of seconds that each node runs per day / 3600 is your daily credit consumption.

For example: If you got an ELT job that runs for 3 mins(180 secs) a day using a Large(8 servers/nodes) cluster to run through billions of rows of data, that will cost you 0.4 credits or $0.80 per day at $2.00 per credit.

= 8 * 180 secs / 3600 =0.4 credits

What else a single Snowflake credit buys you?

Snowflake is an on-demand, automated, redundant & self-healing service. This means a single credit is not a unit of measure for using just a single server for 3600 secs but goes far beyond that. For this reason, you can’t use it to compare the cost directly against another solution with a similar node count. To figure out what else is included during this 3600 secs of usage that a single credit gets you, let’s look at the unseen parts of the service that you receive while your node is running.

Free Queries anyone? You get one, you get & you get one!

Snowflake uses what is called a global query results cache to eliminate compute costs for our customers for any redundant queries. This means once a query is executed by a compute, any subsequent identical query sent to Snowflake by any user will no longer need to use additional compute to access the results as long as the underlying data have not changed and the query syntax is identical. Results of every query is stored as a file as part of account global cache where it can be used by any user & compute across the entire account and not tied to any compute cluster or a user like with some other solutions.

So if you have a BI dashboard with hundreds of users leveraging it everyday, the first person who opens their dashboard in the morning will trigger all the queries and use a compute cluster to do so. Any subsequent users who open up their dashboards after the first person, will see instant performance(millisecond query times) and those queries will be free of charge as they do not need compute in order to be executed. As users use the dashboard more with various slicers & filters, more & more queries are cached where they benefit other users whether they using the same compute cluster or completely different ones as these caches are available across the entire account.

Redundancy & Resiliency:

Each account runs across 3 availability zones(data centers) within a cloud region. This is true for both data and compute. Snowflake will automatically try to resolve any failures without any user disruption or query failures.

Snowflake will re-execute any failed queries due to hardware-related issues by using redundant resources to avoid query failures. It will automatically swap any individual failed nodes within a cluster with new ones within the same data center. It will try to resolve failed clusters themselves by re-provisioning them within seconds from any of the 3 paired data centers depending on the nature of the issue.

Scalability

We are talking about the kind of usable scalability that will actually make a difference in your business. The first thing you need to know about “Scalability” is the fact that it is not simply a checkmark for a feature. This is not a Yes or No option. There are 4 questions you should ask when a solution says they are scalable:

How fast? Being able to scale in hours or minutes may sound cool but that rarely translates to any business benefits. Whether it is for data pipelines or businesses running queries to extract insights, having to wait for scaling to occur for minutes or hours between 2 jobs will not play well with business. This is where a Snowflake credit makes big impact because every time you are using that credit to run a node, there are hundreds or thousands of other nodes ready to go waiting in standby to join the party if your workload needs to scale. That means scaling in Snowflake happens in seconds (Usually less than a second) and currently, no other solution has this. Why does it matter? It matters because it allows you to scale between each SQL job where can go from a single node to a 2XL with 32,64 or more nodes within a second between two SQL commands with just simple ALTER command. This means your data pipelines can be adaptive to each workload and your business users can customize the performance based on the complexity of their needs. Upsize for the first complex market basket analysis query then immediately downsize for the next simple lookup. With 0 wait times.
Is it adaptive Scaling? Being able to scale is one thing but if some admin has to constantly monitor the entire platform or wait for jobs to fail or users to complain where he has to do this manually, that is not much benefit to your business. Mostly it will not happen when you need it and will require a person to babysit. Again that single credit gets you instant, yet adaptive scalability where the Snowflake service layer detects changes in demand and automatically scales the compute.
Is this a business disruptive process? If you have to kill all existing jobs & queries to scale up or down, then it is not much of a benefit for the business. The whole point is to scale exactly when the business needs it and to do it seamlessly without bringing down the system and interrupting their work. Existing jobs should keep running and the new stuff should be faster. Simple as that! Again, that single credit allows those extra warm Standby nodes to be added to your clusters and give you an instant performance boost but without interrupting any of the existing workloads.
Does it scale vertically, horizontally, or both? Scaling is needed to improve two things which are individual Query Performance & Concurrency. Vertical scaling makes the existing box bigger and allows you to run queries faster. However, it does not address the concurrency that is usually associated with ad-hoc BI. So if you have 200 of those queries at one time, that same bigger box will not be able to handle the volume and query performance will plummet. That is where horizontal scalability comes in. It is being able to add or remove clusters of identical size so you can provide consistent query performance whether you have 3 queries running or 300. Again, credit in Snowflake (Enterprise & up) allows automated horizontal scaling which detects concurrency and instantly scales out within seconds and also scales down in seconds when demand goes away.

So if the platform can’t scale up or out automatically based on demand exactly when business needs it & can’t do this instantly within seconds or without disrupting the currently running processes, then saying you can scale does not mean much for the business & that is exactly what a Snowflake credit gets you.

Self-healing:

As there are many redundant compute nodes in each data center ready to take charge in case something happens to any of the existing customer nodes, Snowflake will automatically self-heal itself using either a compute node within the same data center or one of the other two and continues on with your query.

It does this in a fully automated way and without disrupting or failing the original queries using the services layer.

This makes Snowflake literally the most boring platform ever if you are a DBA or a Platform admin. Mostly nothing fails! You don’t have to constantly monitor system health where you run to reset or reboot nodes because they run out of memory, or have software or a hardware glitch. If resources fail in the back end, most users won’t even notice the problem because system will self-heal it self and move on then finish the job. This is one of the reasons why you hear “Things just work” over and over from many Snowflake customers.

Not the most active lifestyle as most sys admins are used to where they are putting off fires & fixing stuff all day long.

Fully Automated Platform Management:

This is the major differentiator of Snowflake from the rest of the competition. There are many processes that traditional solutions are forced to run on customer compute clusters (either on the same machines running the queries or on separate driver nodes) These are basic things like query planning, security, encryption, authentication, scaling, MPP logic &, etc. There is a $ and Performance cost to doing this on customer clusters. If a dedicated driver node is needed, it means an additional server has to run but using a single driver node can easily introduce bottlenecks especially in high concurrency use cases where a single driver node needs to do many things at the same time before passing the query plans to worker nodes. On the other hand, having these execute on the query nodes saves you money by not running an extra node but you pay the penalty in even worse throughput because all the nodes are doing all the pre-query grunt work as well as executing the queries.

Snowflake is completely different in this manner where it uses a separated & isolated Services Layer with its own compute nodes that are designed to do all the pre-query /traffic grunt work like planning, security &, etc. It allows using 100% of the customer compute nodes for actual query execution. So next time someone asks you why Snowflake clusters can easily outperform others with similar nodes counts, part of the reason is the Services layer with its multiple nodes that take away all the resource sapping grunt work while full CPU power of customer nodes are used for actual query execution. So you have this added buffer of compute power in the services layer that is used to increase performance baked into your credit price.

The same services layer is also responsible for all of the automation found in Snowflake that allows you to focus on extracting insights from your data instead of managing a platform. All the redundancy, Self-healing, Encryption, authentication, automated clustering, vacuuming, failover, data sharing, cache & metadata management is fully automated and done by the Services layer which consists of a number of nodes that are completely transparent to the users and daily operations. This is the secret sauce that makes Snowflake so easy to use and the very thing that makes customers say “Things just work in Snowflake”. Everything works because of the services layer that monitors, automates, and does many of the additional work while freeing up customer nodes to be used purely for query executions.

Egress Charges:

Contrary to what competitors may say there are no egress charges with Snowflake as long as customers use one of our many drivers to access the dataset and not export files directly from Snowflake to another cloud or region bucket. This is because credit cost includes all egress charges for querying/reading data. As long as you use Snowflake’s ODBC, JDBC, Python, Spark, .Net, or any other supported driver, you can fetch 1 million, 100 million, or billions of rows of data from any cloud or region, it is included in the credit price.

All of the compute Costs:

In case you missed the top 10 paragraphs, credit price includes all compute. Unlike many other platforms, there are no additional VMs that you have to manage or provision and pre-pay to cloud providers.

Storage Fees:

$23 per compressed TB per month with an average of 4–10X compression. Essentially being able to store up to 10TB of raw data in a single TB of storage. Storage is directly paid via credits so no need to pay additional Storage fees to cloud providers either.

Data Access fees:

If you used a cloud data lake to read/write data in any large capacity, you probably noticed there are additional fees just to access the data from the blob store on top of egress fees. These are usually based on file size & the number of files accessed. These are fairly small fees usually few cents per 1,000 files accessed but can quickly end up being more than the storage fees themselves if the usage/access is fairly high from various compute nodes.

This is yet another thing you don’t have to worry about with Snowflake. Any fees for your Snowflake compute nodes accessing files(micro-partitions) stored in your account is all part of the same Snowflake credit.

Platform/Data Expert Fees:

Unless you haven’t noticed, if you decide to manage the compute nodes, network, security, redundancy, backups & software manually, you are in for a long and expensive ride.

You are going to need DBA(s) to manage the data warehouse stuff, some Spark experts to manage a data lake, big data and the spark platform stuff, let’s don’t forget a cloud expert(they usually are an expert in one cloud so you may need more than one) to handle networking & integration within your VPC and an InfoSec guy to configure, continuously monitor & take care of the security issues across all your cloud resources.

In the end, you are looking for multiple heads with multiple sets of skills. Unfortunately, these guys also have to be fairly smart and they don’t come cheap, nor are easy to find & hire.

Since Snowflake is delivered as a service, all of these complexities, manual configuration, and maintenance are removed from the picture. This simply means faster time to value and less cost for you as well as much higher uptime. You just turn it on & go!

Support:

24x7 Follow the sun support is included as part of the credit as well which gives you a single point of contact for any issue you may experience within the platform whether the issue is related to compute, storage, access, or the software.

Alternative is that you can try the traditional way where you pay separate support fees to each vendor/resource combo and play blame game between the platform software vendor, cloud EC2, Storage & networking folks every time you have an issue. Remember support ain’t free for most cloud resources, you gotta pay for it.

But when you have a problem, is it a software problem with the vendor, is it a problem with your VM config or network access rules or may be the storage access is not configured or there is a firewall?

Who cares? It is a data access issue and you shouldn't have to play support tag with vendors to solve it. Snowflake support is your one & only contact for all things support related and it is part of your credit cost.

What does the architecture look like?

Finally, let’s look at a visual representation of what is running behind the scenes when you are running the smallest XS cluster with a just single node that costs 1 credit per hour.

So to replicate the level of redundancy, DR, automation, and Authentication, security & PreQuery separation from the query node, you would need a minimum of 4 servers running at all times. 1 that runs the query, 2 for standby in case of a node failure, and at least 1 more server as the service/driver layer(& many more if you have any type of high concurrency)

This assumes everything is running in a single data center(availability zone). However, that is not the case with Snowflake. Snowflake has all the redundant resources running on 3 data centers. So in reality, you have to spend much more time, money & resources to build out such a bulletproof system similar to what Snowflake provides you out of the box.

Let’s see a detailed view of the same setup as above running across 3 different data centers within a cloud region.

You can see all the compute that handles services & automation and the customer nodes running the queries are spread across 3 data centers along with all the network security and automated replication, failover, and system monitoring.

Now add in the on-demand serverless nature of these compute clusters where they are utilized & running only when needed and at a paused state without incurring any charges when not in use, a single credit buys you a whole lot more than just a simple compute running for an hour.

You get 1 hour of compute resources with built-in DR sites in 3 different data centers, a services layer with its own compute that automatically monitors your platform and fixes things, a services layer that is responsible for automating most basic DBA tasks which allow you to simply load your data and start querying w/o any tuning & ongoing maintenance and the same services layer that takes away all the compute work required to prior to executing a query to use 100% of your compute cluster power for executing queries.

This is why you can’t really compare apples to apples with any other platform in terms of credit costs because no other platform has this. The cost includes everything from computing, data, automation, services, data access fees & support. There is nothing else you have to pay to use the service which is certainly not the case with other platforms where you have to pay different amounts to different parties at different rates(Software, cloud provider storage, VMs, Network) which can easily make calculating full TCO a nightmare.

And this is why Snowflake will outperform any other platform with similar compute resources because you get much more than just a single node per credit.

Next time, someone says there is a cheaper alternative, make sure what the real cost of having that cheaper alternative would be if the performance, security, scalability, reliability & ease of use are important to you where you may have to replicate at least some of the things you get as part of the standard Snowflake service.

Mythbusting Snowflake Pricing! All the cool stuff you get with 1 credit

What else a single Snowflake credit buys you?

What does the architecture look like?

Written by Nick Akincilar