Last new Snowflake features — Sep 2022

Ilyesmehaddi
8 min readSep 12, 2022

--

Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

Introduction

The Data Cloud is growing very rapidly across the cloud and regions, with more and more enhancements and relevant new features. Indeed, Snowflake is taking into account customer feedback and requests to progress in this direction, allowing them to accelerate their business and reduce time-to-market.

Some of these were announced at the Snowflake Summit 2022, such as:

Unistore & Hybrid Table : for transactional workloads in addition to analytics workloads.

Native Application Framework : for developing and deploying natives apps very easily.

Snowpipe Streaming : for ingestion data 10X faster than Snowpipe with supporting KAFKA connectors.

Iceberg Tables (with Apache Iceberg) : open source format is now supported.

Materialized Tables : Write declarative code for the transformation and Snowflake handles the incremental refresh to materialize the Pipeline.

Data Replication : in addition to the databases, more other objects are supported like users, roles, …

Snowpark for Python.

Streamlit Application.

External Tables for On-prem Data : Connecting to On-premise Data using storage compatible with the Amazon S3 REST API (Dell ECS, PureStorage or MinIO).

Data Engineering

Snowflake offers many features for Data Engineering like Tasks, Streams, Snowpark with multiple languages and Snowpipe which is a serverless service for data streaming and CDC (Change Data Capture) approach.

What’s new ?

In order to monitor data ingestion in Snowpipe, especially tracking errors, Snowflake announced a preview of Snowpipe error notifications for Google Cloud and Microsoft Azure. Error notifications can be pushed to either the Google Cloud Platform Pub/Sub or Microsoft Azure Event Grid cloud messaging service.

In addition, task error notifications can send a description via cloud messaging when errors are encountered during a run.

Users can execute transformations more easily with the public overview of task support by directed acyclic graphs (DAGs)

A DAG is a series of tasks with a single root task and additional tasks organized by their dependencies. Until now, users were limited to task trees, in which each task had at most one parent task.

In a DAG, the tree becomes a Graph because each non-root task can have dependencies on several parent tasks, as well as on several children tasks that depend on it.

Snowflake Scripting is now available. It allows users to create scripts and stored procedures in SQL, thanks to extending SQL with control structures and statements, such as conditions and loops.

Why does it matter ?

Snowflake customers need to track and monitor their workloads. Therefore, error notifications are the best way to track errors and take quick action to correct them, reducing monitoring time and improving data quality and business key performance indicators.

This incredible feature (Snowflake Scripting and DAG) gives users more flexibility to create complex scripts without the need to use other languages like Javascript. Means that Snowflake is accessible to the largest number of people without stronger skills.

Data Science & ML (Snowpark)

Snowpark enables Data Engineering and Data Science Workloads. Snowpark uses DataFrame data structure and has similar functions as Spark. Most important difference being that with Snowpark, all the compute are still in Snowflake without moving data outside your Snowflake environnement.

Until now, data developers can use SQL, Native Java UDFs or in some cases Java/scala to implement pipeline or ML models.

What’s new ?

Snowpark API for :

  • Java is generally available on AWS,
  • Scala and Java UDFs are generally available on Azure.

And for Python fun, Snowflake announced that Snowpark for Python is also available now. This release includes the public preview of :

  • Snowpark for Python API,
  • Python User Defined Functions (UDFs) and User Defined Table Functions (UDTFs),
  • Stored Procedures,
  • and Batch API (Vectorized UDFs)

Additional updates coming :

  • Snowflake Worksheets for Python (in private preview) : enables users to develop pipelines, ML models and applications directly in Snowsight.
  • Large Memory Warehouses (in development) : empowers users to execute complex Machine Learning operations such as feature engineering and model training on large datasets using Python.

Why does it matter ?

For data use cases, in addition to SQL, each company adopts one or more different languages like Python, Java and Scala. With Snowflake, each of these companies can use their favorite language. This makes it easy to migrate existing workloads to Snowflake, like existing ML models in Python for instance.

Performance and optimization

Scalability is one of the key features and something that makes Snowflake different from others competitors. Snowflake enables 3 dimensions of scaling :

  • ACROSS : many workloads, same compute resource (virtual warehouse)
  • Scale UP (one cluster, vertical scaling) : single query performance (more data, high complexity), add more servers to the same cluster
  • Scale OUT (Multi-clusters, horizontal scaling) : multiples concurrents users (more queries simultaneously, add more cluster

What’s new ?

As we saw before, Scale UP allows for vertical scaling to accelerate queries by resizing the virtual warehouse (increase size).

Snowflake’s query acceleration service is in public preview for Enterprise editions and higher, providing more flexibility to scale and reach high performance with the same virtual warehouse size (without manually resizing the virtual warehouse) and scale to additional compute nodes.

Now, you can also run write-intensive DML (Data Manipulation Language) workloads much faster with low latency.

Beginning on AWS, new t-shirt sizes (5XL and 6XL) in Private Preview and Map Search improvements (Search Optimization Service).

Why does it matter ?

To simplify the scale up for complex queries that process a lot of data, Snowflake has automated the vertical scalability to avoid the manual process.

Governance

Many features for Governance are available on Snowflake to manage and control access to the data. One of the most relevant is Snowflake’s native data classification.

What’s new ?

Snowflake announced that Snowflake’s native data classification is now generally available on AWS and Azure, and soon on Google Cloud.

We’re excited to announce that Snowflake’s native data classification is now available on AWS and Azure, and soon on Google Cloud.

Indeed, Snowflake deployed a new model with improving the accuracy and added support for single-variable column classification and output of all column results.

In fact, organizations need to identify and classify the information as “personal” to know where it is in their Snowflake account and track it with system tags.

Snowflake made a data masking feature that makes users able to create a masking policy manually applied to the column to protect the data according to the role.

Now, in order to simplify data protection efforts, Snowflake annonce Snowflake’s Native Tag-Based Masking Policy in Public Preview.

Why does it matter ?

Unlike standard data masking, a tag-based masking policy is applied to object tagging to allow a masking policy to be set on a tag and not on a specific column. This gives users large capabilities to manage access and simplify the governance of the data.

Marketplace

Snowflake Marketplace makes providers able to share data as well as applications. An app-store for Data & Applications. In order to track and monitor the usage and consumption of their content by the consumers, they use Provider Studio analytics. For instance, they can have visibility on the active consumers, views, requests, database mounts, and query executions. These metrics and analytics help providers to improve their products.

What’s new ?

Snowflake offert more aggregation metrics and analytics in order to improve and optimize the provider’s performance in the Marketplace (In Private Preview).

A new schema is also available called DATA_SHARING_USAGE and which stores all the metadata, metrics and analytics related to the products and the contents. This schema contains views that providers can query using simple query.

As you probably know, one of the purposes of The Data Cloud is to democratize access to different content whether it’s data, applications. By the ability to publish content, providers will be able to monetize standardized products to self-service buyers.

Now, in the Provider Studio we have a new tab named “Learn” which gives you more information about the Marketplace, how to become a data provider and the related terms and conditions to know before. You can also find the most relevant listing thanks to this section “Most Popular Listings”.

Why does it matter ?

More metrics and analytics means more understanding of how a provider’s product is consumed and what the impact of each product is.

Data Replication and Failover

This feature enables replicating databases between Snowflake accounts (within the same organization) and keeping the database objects and stored data synchronized. Database replication is supported across regions and across cloud platforms.

What’s new ?

Snowflake Account Replication and Failover capabilities are also available in public preview. This replication includes for sure databases but also metadata and integrations, making business continuity truly turnkey. In addition, Auto-fulfillment makes cross-region and cross-cloud sharing easy.

Why does it matter ?

Combined with the upcoming Client Redirect feature (generally available soon), users will be able to recover their account and client connections in seconds and perform failover across public clouds. Snowflake customers don’t need to create multiple accounts in different regions.

Global Data Cloud

The Data Cloud is available in the three main public providers : AWS, Azure and Google Cloud, in several regions, grouped into three global geographic segments (North/South America, Europe/Middle East, and Asia Pacific).

Regions let your organization choose where your data is geographically stored and also determine where your compute resources are provisioned.

What’s new ?

Snowflake is now available in : UK South (London) and Central India (Pune) on Azure, South America (São Paulo) and Asia Pacific (Osaka) on AWS, and U.S. East 4 (N. Virginia) on Google Cloud.

Why does it matter ?

More regions means bigger Data Cloud and more possibilities for Snowflake’s customers especially with the different legislations according to countries and regions.

Summary

Snowflake Data Cloud is growing incredibly with many new features. Some of them are announced at the Snowflake Summit 2022. In this blog, i wanted to present the new ones cames after.

New features are coming soon!

More ressources :

--

--

Ilyesmehaddi

EMEA Senior Data Cloud Architect | AI & GenAI Expert @Snowflake Data Cloud