“Snowflake: Redefining the Cloud Data Warehousing Landscape”

Published in

AnalyticsHere

4 min readJun 13, 2023

Why should someone consider Snowflake?

Snowflake is a relatively new entrant in the cloud data warehouse space and is designed specifically for cloud platforms like AWS, Azure, and GCP.

It offers several unique features that set it apart from traditional data warehousing solutions

FEATURES INCLUDE:

Time Travel Function
Data Sharing Function
Data Cloning (Zero Copy Clone) Capabilities
Reader Accounts (non-Snowflake Users)
Security out of the box (RBAC/DAC)/Encryption/Column Level Masking/Row Level Security
All major connectors (JDBC/ODBC/Spark, among others) are supported.

SNOWFLAKE FOUNDERS:

Snowflake was founded by three individuals: Benoit Dageville, Thierry Cruanes, and Marcin Zukowski.

How did Snowflake come to be?

First Generation Traditional On-Site Enterprise Pata Warehouse:

The first evolution of data warehousing involved traditional on-premise enterprise data warehouses that were able to provide fast answers to business queries. These data warehouses were typically SQL-based databases.

Challenges

Not built for all types of data: Traditional data warehouses are limited to structured data, not unstructured or semi-structured.
Not designed for all users: Traditional data warehouses limited users and did not accommodate growing data demands.
Inability to scale up/down: Inherently incapable of scaling up/down in response to business needs.

Cost of Ownership Comparison between On-prem and Cloud

Second Evolution and First Generation Cloud EPW

The second evolution of data warehousing involved the migration of traditional on-premise enterprise data warehouses to the cloud, known as the first generation of cloud-based enterprise data warehouses (EPW).

Data onboarding: Cloud EPWs handle structured and unstructured data, enabling diverse processing and ingesting.
Fast query response and SQL-based: First-generation cloud EPWs focused on fast query answers and SQL-based databases.
Access for different users: The second evolution prioritized data access for diverse users, enabling self-service capabilities and user-friendly interfaces.

However, the inherent problems of data warehousing persisted in the second evolution. Challenges such as data integration, data quality, and data governance remained, requiring organizations to address these issues as they scaled their data warehouse operations in the cloud.

The third evolution of data warehousing involved the emergence of Hadoop and data lakes:

Hadoop, an open-source framework, enables distributed data storage and processing, while Data Lakes store raw, unprocessed data.
Support for any data type: Hadoop and data lakes enable organizations to store diverse data types, enabling diverse data sources and capture.
Increased accessibility: The third evolution improved accessibility by enabling multiple users to access and query data, enabling stakeholder interaction, and improving data lake functionality.
Performance and scalability challenges: Hadoop faces performance and scalability challenges, affecting query processing and cluster management.

The fourth evolution of data warehousing involved the advent of cloud-based data platforms.

This evolution brought significant advancements in terms of data accessibility, performance, and ease of management. Cloud Data Platforms offer comprehensive data warehousing solutions, unified environments for users, and data management.

Fast query response and SQL support: Cloud data platforms use SQL for fast query answers and familiar tools.
Native support for structured and semi-structured data: Cloud platforms support native ingestion and processing of structured and semi-structured data, enabling organizations to handle diverse data types efficiently.
Concurrent access for multiple users: Cloud data platforms enable simultaneous collaboration among multiple users, enabling analysis, business, and data scientists to access data efficiently.

Conclusion:

Traditional On-Premise Enterprise Data Warehouse: Fast query response, limited data type support, and scalability challenges.
1st Generation Cloud EPW: Migration to the cloud with improved data type support and user accessibility, but challenges with data integration and governance.
Hadoop/Data Lakes: Introduction of Hadoop and Data Lakes for storing diverse data, but faced performance and scalability challenges along with operational complexities.
Cloud Data Platforms: Unified environment with fast query response, support for all data types, concurrent user access, and easy management through SQL, enabling organizations to leverage the full potential of their data in a scalable and efficient manner.