Snowflake DB is Awesome! Here’s Why…

S B
Hitachi Solutions Braintrust
6 min readSep 18, 2019
Photo by Tomas Yates on Unsplash

Cloud technology services are everywhere! A lot of our favorite companies like Apple, Netflix, Spotify, and Samsung are benefiting from cloud services on a global scale. They consume and generate vast amounts of data and sophisticated algorithms that output into a better customer experience and smarter business and technological innovation. For cloud technology providers, competition is raging between the likes of Azure, AWS, Google Cloud, and others to gain a bigger piece of the global cloud market pie.

Businesses from small size to large are facing never-before-seen challenges:

  1. Scale-up — how to grow capacities dynamically with the demand
  2. Cost of operations/development — how to track operational expenses on current infrastructure and foresee costs of development with new tools
  3. Return on investment with technology change — how to justify the value gained by upgrading to new technology

In many cases, businesses that had been operating on old infrastructures have outgrown their data storage-retrieval capacity. Typical examples of this would be a transactional system of record that originally shipped with a database, is way past its lifetime, and increasingly proves the need for its retirement. Data in such a database or data warehouse is still valuable, and that makes the transition to the cloud more challenging.

We are always aligned with client businesses, trying to find the right candidate technology that is effective in alleviating the pain points while leaving little or no impact to the BI (business intelligence) layer. Snowflake Database is one such tool/technology. It’s been making waves recently by promising high performance, concurrency, and simplicity.

This blog is an objective review of Snowflake DB and its basics — for enthusiasts to get started.

Snowflake basics and high-level architecture

Snowflake is a cloud-based data warehouse-as-a-cloud-service (SaaS for DB) that requires no hardware or software installation. The maintenance and tuning of cloud infrastructure is handled by Snowflake. It is based on a new SQL database engine with unique features and advantages over more traditional data warehousing technology approaches.

On a high-level, the architecture can be divided into three layers:

  • Data Storage: When data is loaded into Snowflake, it is stored in a proprietary compressed columnar format. The format is optimized for storage on Snowflake cloud and the database objects (tables, views, etc.) are only accessible to the users through SQL queries written in a notebook-like interface. This happens at the end of queries that operate on the loaded data.
  • Query Processing: SQL queries written on the front end are processed in this layer. Queries are processed at warehouse level and each warehouse consists of one computer cluster made up of several node computers. The clusters are 1:1 with warehouses and are not shared among warehouses. This isolates warehouses in case of degradation of performance and makes it easy for fixing infrastructure issues.
  • Cloud Services: This acts as the front-end layer for Snowflake, offering various services like user authentication and login, parsing of queries, query optimization, and so on.

Loading and unloading data

Snowflake DB is an all-cloud application with both endpoints (ingress and egress) in cloud applications. To upload or unload data, a Stage needs to be created. This Stage connects Snowflake with two cloud storage options besides Snowflake hosted data itself. The two options are AWS S3 bucket and Azure Blob Storage.

For Azure Blob option, a Storage Account with relevant permissions must be provisioned prior to creating a Stage.

After connecting with the right cloud service, data in the Stage can be copied into the Snowflake database and schema of choice via simple COPY INTO command.

Quick glance at Snowflake DB UI

When logged in as a customer, Snowflake bears visual similarities to popular SQL Server Management Studio and cloud Notebooks. Here are some of the major sections:

  1. Menu Bar: This contains different menus
  2. Databases: Select/configure databases
  3. Shares: Menu available for administrators to enable data sharing within the organization
  4. Warehouses: Menu to create/configure and implement data warehouses (compute clusters)
  5. Worksheets: Notebook-like feature to write and execute SQL code
  6. History: Record of every query run in last 24 hours
  7. Partner Connect: Menu for connecting with other partner services like Chartio and Sigma
  8. Help: Links to documentation and downloads for connecting with other tools
  9. User: Menu for configuring user role and other preferences
  10. Database Object Browser: This shows all database objects within the selected data warehouse
  11. Context: This opens up a selection box for Role, Data Warehouse, Database, and Schema
  12. Worksheet: This resembles a SQL file and is used for data related and configuration related SQL code
  13. Preview: This tab shows query results and data preview through appropriate SQL query and also shows errors and history

Snowflake DB in a typical Azure Analytics architecture

Connecting to BI layer

Snowflake connects very easily with BI using a specialized ODBC connection. Once you get the driver for Windows/Mac/Linux from Snowflake Repository, you will need to set up the DSN onyour machine and you’re good to go! Following is an example of NYC Citi Bike data in Snowflake visualized by Power BI dashboard:

Power BI connection with Snowflake
Power BI dashboard accessing the Citi Bike Data

Why Snowflake?

There are several advantages and attractive features with Snowflake including:

  • Combine data from different sources in the same cloud data warehouse, data consolidation
  • Modern data warehouse — can connect to Azure, AWS, or self-hosted data platform
  • Can handle structured and unstructured data natively (has a JSON native format)
  • Scale and compute are elastic
  • Data science ready with connection to Spark and Python — for Azure, Databricks, and Azure Notebooks
  • Can readily connect with Power BI and any other BI tool — with special ODBC DSN
  • All cloud-born applications with focus on data and without worry of infrastructure
  • Snowflake is currently offering a free trial with $400 credits!

Next steps

The cloud can be challenging, so take a deep breath and:

  1. Evaluate the feasibility and build use cases for Snowflake
  2. Discuss the future investment in Snowflake DB as a tool and as a technology for intermediate to long-term analytical solutions
  3. Call Microsoft partners such as Hitachi Solutions to guide your business on the adventurous digital transformation journey

--

--