Why Snowflake DB is exciting!

S B
Topical Synopsis
Published in
6 min readAug 28, 2019

The world of big data analytics is chaotic. In the era of massive digital transformation and multiple big data platforms, the competition is fierce between the rivals Azure, AWS, Google Cloud and others. Their clients on the other hand, have their own challenges of scale-out, cost of operations and development, and the value gain out of any change in technology. In many cases, businesses which had been operating on old infrastructure have outgrown their data storage-retrieval capacity. Typical examples of this would be a transaction system of record that originally shipped with a database which is way past its lifetime and increasingly proves the need for its retirement. Data in such said database or data warehouse is still valuable, and that makes the transition to cloud more challenging.

While it is expensive for C2C (cloud to cloud) migration are high it is still relevant to look for the right candidate that is effective in alleviating the pain points of businesses while leaving little or no impact to the BI (business intelligence) layer. Snowflake Database is one such tool/ technology that is making waves recently promising high performance, concurrency and simplicity. This article is an objective review of Snowflake DB and its basics — for enthusiasts to get started.

Snowflake Basics and High-level Architecture:

Snowflake is a completely cloud-based data warehouse as a cloud service (SaaS for DB) which requires no hardware or no software installation. The maintenance and tuning of cloud infrastructure is handled by Snowflake. It is based on a new SQL database engine with its unique features and advantages over more traditional data warehousing technology approaches.

On a high-level, the architecture can be divided into 3 layers:

  • Data Storage: When data is loaded into Snowflake, it is stored in a proprietary compressed columnar format. The format is optimized for storage on Snowflake cloud and the database objects (tables, views, etc.) are only accessible to the users through SQL queries written in a notebook-like interface. This happens at the end of queries that operate on the loaded data.
  • Query Processing: SQL queries written in the front end are processed in this layer. Queries are processed at warehouse level and each warehouse consists of one computer cluster made up of several node computers. The clusters are 1:1 with warehouses and are not shared among warehouses. This isolates warehouses in case of degradation of performance and makes it easy for fixing infrastructure issues.
  • Cloud Services: This acts as the front end layer for Snowflake offering various services like user authentication and login, parsing of queries, query optimization and so on.

Snowflake — SaaS for DB

- All Cloud database and data warehouse

- Ability to work on data from separate cloud platforms

- Based on new SQL engine

- Data Storage in proprietary columnar format

- Friendly, fast and easy UI

Loading and Unloading Data:

Snowflake DB is all- cloud application with both endpoints (ingress and egress) in cloud applications. To upload or unload data a Stage needs to be created. This Stage connects Snowflake with two cloud storage options besides Snowflake hosted data itself. The two options are AWS S3 bucket and Azure Blob Storage.

For Azure Blob option, a Storage Account with relevant permissions must be provisioned prior to creating a Stage.

After connecting with the right cloud service, data in the stage can be copied into Snowflake database and schema of choice via simple COPY INTO command.

Quick Glance at Snowflake DB UI:

When logged in as a customer, Snowflake bears visual similarities to popular SQL Server Management Studio and cloud Notebooks. Here are some of the major sections:

  1. Menu Bar: This contains different menus
  2. Databases: select/ configure databases
  3. Shares: Menu available for administrators to enable data sharing within the organization
  4. Warehouses: Menu to create/ configure and implement data warehouses (compute clusters)
  5. Worksheets: Notebook-like feature to write and execute SQL code
  6. History: record of every query run in last 24 hours
  7. Partner Connect: menu for connecting with other partner services like Chartio and Sigma
  8. Help: links to documentation and link to downloads for connecting with other tools.
  9. User: menu for configuring user role and other preferences.
  10. Database object browser: This shows all database objects within the selected data warehouse.
  11. Context: This opens up a selection box for Role, Data Warehouse, Database and Schema
  12. Worksheet: This resembles SQL file. It is used for both: data related and configuration related SQL code
  13. Preview: This tab shows query results and data preview through appropriate SQL query. It also shows errors and history.

Snowflake DB in a typical Azure Analytics Architecture:

Image taken from Snowflake.com

Connecting to BI (Business Intelligence) Layer:

Snowflake connects very easily with BI using a specialized ODBC connection. Once the driver for Windows/Mac/Linux from Snowflake Repository, you will need to set up the DSN at your machine, and then you are good to go! Following is an example of NYC Citi Bike data in Snowflake visualized by Power BI dashboard.

Power BI connection with Snowflake
Power BI dashboard accessing the Citi Bike Data

Why Snowflake:

There are several advantages and attractive features with Snowflake.

  • Combine data from different sources in the same cloud data warehouse, data consolidation
  • Simplicity and ease — with SQL engine as base and ANSI-SQL scripting, job scheduler is built-in
  • Modern data warehouse — can connect to Azure, AWS or self-hosted data platform
  • Can handle structured and unstructured data natively ( includes Parquet and JSON and has a JSON native format)
  • Scale and compute are elastic
  • Data science ready with connection with Spark and Python — for Azure — Databricks and Azure Notebooks
  • Can readily connect with Power BI and any other BI tool — with special ODBC DSN
  • All cloud- born applications with focus on data and without worry of infrastructure, no need for software or hardware installs
  • Snowflake is currently offering a 30-day free trial with $400 credits!

--

--