Snowflake versus Redshift: which of these giants suits your company’s data cloud needs?

Arthur Marcon
Indicium Engineering

--

In an increasingly digital world, processing power and data storage capacity have become competitive advantages for companies. In this context, cloud computing plays a central role in providing speed and reliability for organizations’ data. However, choosing the ideal cloud solution for your company is not an easy challenge, especially with so many possibilities on the market. In this post, we present two of the main leading solutions in the cloud computing market: Snowflake and Redshift, and their main differences in 4 aspects: architecture, scalability, performance and pricing strategy.

First of all, what is cloud computing and data cloud?

The concept of cloud computing refers to the use of cloud applications for the delivery of information technology resources. That is, instead of your company buying and maintaining data centers and physical servers, cloud computing allows using the huge computational capacity of large providers such as Amazon, Snowflake, Google and Microsoft, allowing high performance, data security, scalability and flexibility according to your company’s demands. Also, cloud computing helps prevent data loss.

In a similar way, data cloud identifies cloud solutions that store and process data, eliminating “data silos” and enabling seamless integration between companies’ storage and processing needs to turn data into monetizable resources. That is, data cloud lies within the cloud computing concept.

Currently, there are many data cloud solutions on the market with differences that can help your company mine the value of existing data in a faster, more scalable and secure way. Two of these important data cloud solutions are Redshift and Snowflake.

Redshift vs Snowflake: A battle of giants

Let’s start by understanding who’s who in the data cloud dojo. Snowflake is an advanced data platform that allows you to store, process and analyze data in a fast, flexible and scalable way. One of the differentials of Snowflake is that it is a self-managing platform. This means that the user does not need to configure any hardware (physical or virtual), install complicated software or manage and maintain complex data infrastructures.

Because it runs completely on cloud infrastructure, Snowflake allows a quick and uncomplicated delivery of its full value potential, removing the need for highly trained professionals, therefore, being a more affordable solution. Importantly, Snowflake is built on top of other cloud services such as AWS, Google Cloud Platform, or Azure, making it a multicloud data warehouse solution that takes full advantage of the multiple clouds on the market.

On the other hand, Redshift is part of the Amazon Web Services (AWS) family, and it is Amazon’s cloud data warehouse service that allows scalability and high-speed delivery for data storage and processing. Typically, Redshift charges based on contracted cluster allocation. But with Redshift Serverless, billing is optimized and the customer is only charged when the service is being used. However, the setup and configuration of Redshift can demand several engineering resources and more technical knowledge, which makes its implementation a little more complex, demanding data engineering expertise. However, its integration with other Amazon services makes it a very complete and integrated tool. Table 1 summarizes the main characteristics of the two data clouds.

Table 1 — Comparison between Snowflake and Redshift

Characteristics

Architecture

Snowflake uses a shared cloud data warehouse architecture, allowing multiple organizations to access the same resources in isolation. Because it is built on top of other cloud services, Snowflake is a multi-cloud data warehouse solution that acts as an intermediary absorbing risk and optimizing storage and processing.

Figure 1 — Snowflake Architecture

Redshift is based on a Massively Parallel Processing architecture, where data is distributed among compute nodes for parallel processing. Thus, a certain proficiency in more technical issues of data warehouses is required for configuring clusters and nodes in order to scale the processing and storage of the system for optimized performance.

Figure 2 — Redshift Architecture

Scalability

Snowflake automatically scales, allowing you to scale resources up or down depending on the demand, with no disruption to your data warehouse’s data storage. Thus, the scalability of data processing in Snowflake is not tied to more storage (and storage costs).

In a similar way, Redshift allows you to scale vertically (increase the size of instances) and horizontally (add nodes) to handle larger workloads, however, resizing clusters in Redshift can cause some momentary system downtime, impacting availability. Furthermore, in Redshift, the increase in storage necessarily implies an increase in data processing costs due to its architecture.

Pricing

Regarding pricing, Snowflake operates on a more granular pricing model, separately charging for data storage and processing through credits purchased by users. Thus, the cost structure is executed as follows:

  • Processing Usage: charges based on computational resources employed in the execution of queries in the warehouse (pay per query)
  • Storage Usage: processing-independent calculation. Storage pricing is calculated based on the monthly volume of terabytes of data stored. Snowflake uses data compression and storage optimization to reduce costs.

On the other hand, Redshift has a pricing structure based on instances and usage time, in a “pay-as-you-go” model, where clients are only charged by what was consumed. Thus, Redshift pricing can be broken down into the following components:

  • Processing Usage: Typically, Redshift charges based on the number and types of nodes in a cluster used per hour. So you can choose between on-demand (as-you-go) billing or long-term Reserved Instance contracts.
  • Storage Usage: Combined costs for storage and processing simplify the pricing model. Costs are based on node types and cluster sizes.
  • Concurrency Scaling: Redshift’s concurrency scaling feature helps to better manage spikes in queries or tasks occurring at the same time that could slow down the cloud. To avoid this, concurrency scaling provides extra processing power when it is necessary to execute many queries and the user only pays according to the use of this resource. When that extra processing power is no longer needed, Redshift pulls out the additional clusters and stops charging.

The following table presents a simulation comparing a load of 1 TB/month of storage on Snowflake and Redshift. In this simulation, 2 hours of ELT per day, 8 hours of analytics per day with 50 users are considered. Simulating a total of 20 queries per user per day for 30 days in a month, we notice that Snowflake has a monthly cost of $768, while Redshift has a cost of $806.

Table 2 — Pricing comparison

In this post from Indicium, we help you calculate the cost of implementing Snowflake in your company in further details.

Which is best for your company?

Different companies have different data cloud needs and finding the solution that best fits your business is essential for extracting the most value from the cloud. Below, we have compiled some recommendations that can help you decide on the best data cloud for your company:

Snowflake: With its multicloud solution and more intuitive implementation, Snowflake embraces many of the benefits of AWS, Google Cloud Platform and Azure. It is preferred by startups and small and medium-sized businesses due to its billing system that separates storage and processing. However, Snowflake also plays a great role in large enterprises and companies with data security and privacy concerns, so it fits very well in sectors such as finance and healthcare.

Redshift: as part of the Amazon family of services, Redshift has impeccable integration with the AWS ecosystem, which makes it a very attractive option for companies that use, for example, S3 or AWS Glue. Furthermore, Redshift is very efficient at scaling large masses of data and provides many data security features to handle sensitive data such as e-commerce transaction data for example.

References

https://airbyte.com/blog/snowflake-vs-redshift

https://bootcampai.medium.com/redshift-vs-bigquery-vs-snowflake-una-comparaci%C3%B3n-del-datawarehouse-m%C3%A1s-popular-para-la-dd7ec72e0e8c

https://www.astera.com/pt/type/blog/redshift-vs-snowflake/

https://www.snowflake.com/en/data-cloud/platform/

https://aws.amazon.com/pt/redshift/

--

--

Arthur Marcon
Indicium Engineering

Business Intelligence | dbt | SQL | Data | Innovation | Analytics | Research