Snowflake Summit 2022: Summary of New Features announced

Published in

CodeX

6 min readJun 15, 2022

Snowflake Data Cloud had done a number of groundbreaking innovations in the last twelve months that changed how customers use the data and extend the value of the Snowflake platform. There are seven pillars Snowflake Data Cloud is built upon:

Industry Alignment: Align with the vertical and their use case. You will see Finacial Data Cloud, Retail, Telco, Media, Healthcare, etc. whereby you can get data, workload, and use-case related to the respective industry.
Scalability and Concurrency: Snowflake supports all sources, all data (structured, semi-structured, and unstructured format ), and all workloads (machine learning, analytics, data applications, collaborations) and no limit on concurrency with multi-cluster shared architecture. Near unlimited scalability for data and compute to run “any” workload.
Global: Snowflake is a Global, multi-cloud, multi-region. That means it allows data and other snowflake objects cross-cloud/cross-region replication. You can have federated data and SnowGrid allows you to treat it as one cloud.
Self-Managed: Snowflake is Smart, behind the scene it works for you. It's one platform that just works, it managed everything behind the scene like security, infrastructure, optimization, etc. So you focus on your workload and not the infrastructure to make it work.
Programmability: Snowflake supports various programming languages such as Python, Java, and Scala, along with ANSI SQL to run your workload. You can write UDF in the language of your choice and can use that UDF in your SQL. For example, you can use Python UDF and Java UDF to use both in a SQL statement to process column values from the Snowflake table. You can also apply logic to a set of rows using UDTF.
Marketplace: Snowflake has a community-driven marketplace that can not only share data but applications as well. It also allows the monetization of your data and the application that you developed. Just like what Apple Store and Google Marketplace do, however, Snowflake marketplace is for Enterprise. This is a great opportunity for partners to develop applications and sell to the customers.
Governance: Snowflake has a number of features related to Data Governance such as Dynamic Data Masking, Row-level policies, Encryption, Data Classification, Tagging, Lineage, and Access history. Compliances, etc. This feature automatically enables, no matter where you access data. So your data is not only secure but adheres to compliance defined by your organization.

Here is the summary of the all the innovations and features on those pillars:

Hybrid Table: Snowflake was built for analytics workload and Snowflake stores data in columnar format. Not anymore! With Unistore workload aka hybrid table, it allows you to store transactional data and analytics data in one place. This will allow you to do single-row ingestion and lookup faster, and provide you data integrity using primary and foreign keys to eliminate duplicate rows. It allows you to do analytics queries along with the ability to join a hybrid table with a “regular” table. This allows you to reduce multiple data stores (i.e. one for OLTP and another for Analytics workload). Also, prevent creating a pipeline to migrate data from OLTP sources to Snowflake. All in one place i.e Snowflake. This is another step to removing silos. Please note that you can use the same warehouse as you were using for the analytic workload.
Account Replication: Snowflake already allows you to replicate your data to cross-cloud/cross-region. But, now it allows you to replicate other objects such as warehouse, uses, roles, etc. so it seamlessly allows you to redirect your production site to the DR site in case of disaster. Not only that, but now you can able replicate Pipeline, so if you have a pipeline that loads data from blob storage such as S3, that pipeline seamlessly replicates to another region, without duplicating data from the source.
Snowpipe Streaming: So far customer uses Snowipe to ingest data automatically when new data arrived in cloud storage. Snowpipe Streaming allows ingesting data 10x faster than snowpipe. Snowpipe streaming using rowset instead of loading data in stage thereby reducing latency. It has out-of-box support for Kafka Connector, which means with a single change in the configuration file you can reduce the latency for streaming data by utilizing snowpipe streaming. Snowpipe Streaming is a serverless feature. In addition to this, Snowflake will also have Materialized Table which allows you to create a declarative pipeline to define when and how can do incremental data maintenance.
Snowpark for Python: You can build a data engineering and machine learning model using the programming of your choice, without worry about setting up separate cluster for each programming language. You no longer need to worry about the version and dependency of libraries. With Snowpark, it can now run SQL, Java, Python, and Scala code in the same warehouse and you can scale anytime as your workload allow you to do so, as a result, you can control cost. It supports scalar UDF and tabulur UDTF that can operate on a dataframe. Of course, you can use Jupyter notebook as a client to work with the dataframe, but you can choose IDE of your choice to run your process anywhere as you like. You can also do ML training using open-source libraries available via Anaconda. Furthermore, Snowsight now has a Python-based worksheet to eliminate the need for additional IDEs so you can start building code faster.
Native Application Framework: You can build an application in Python on Snowflake by creating a new first-class object called “Streamlit”. Which runs natively in Snowflake, and allows you to list this application in Snowflake Marketplace and monetize on it. This will allow you to remove friction and expertise required for building & selling apps such as standing up infrastructure, building authentication, security, and governance.
Iceberg Table: For those who have issues with loading data in Snowflake due to compliance reasons, now you have an option to keep data in your custody (i.e. keep data in your storage) by using the truly open-source Apace Iceberg table. It provides CRUD operation, and other open-source Iceberg features in addition to Snowflake foundational features such as time travel, governance etc. All you have to do is create a storage volume object pointing to blob storage (just like Snowflake Storage Intgration Object), and use that storage volume object while creating a table. Your data will store in that blob storage along with Iceberg metadata. Iceberg data is stored in parquet format.
Streamlit Application: Streamlit increases collaboration between data science and business teams while leveraging the power of Snowpark for model development and production. With this feature, you can now create the first-class Streamlet object “CREATE STREAMLIT….” with your python code, which can run in snowflake. This will allow you to build, deploy, and securely share your Streamlit apps all within Snowflake.
Cost Governance: Snowflake now allows you to create a resource group whereby you can add all the snowflake objects you want to use for Cost Governance or Chargeback to your customer or LOB. Furthermore, you can create and associate a budget with that resource group, and create rules and action when they go over budget.
Governance: A new feature called Tag-based masking, allows you to create making policy based on a tag rather than a column, which will apply to all the columns associated with the tag. So if you have a new table/column, all you need to do is assign a tag, and that column and dat will be protected based on the policy you defined. There is a number of other features announced such as column-level lineage, and Governance UI.
External table for On-Prem Data: If your data resides in on-prem, and are using storage that is compatible with S3 API such as Dell ECS, PureStorage, or Minio, then your storage can be used as an External Table in Snowflake, which makes it easier to load data into Snowflake.

Apart from this, there are a number of enhancements and performance improvements made to the platform which allow you to save costs, for example, 10% better performance in AWS, 10% for write-heavy workload, and a further improvement in Storage compression. Snowflake also announced better performance of Geometry data type which allows faster local geo calculations and simplifies migrations from legacy DW to Snowflake. All this is automatically applied to you without making any changes whatsoever.

In summary, Snowflake Data Cloud is a single product/service (software as a service) where you can put all of your data and run all use cases, for all users, securely.

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

Snowflake Summit 2022: Summary of New Features announced

Written by Umesh Patel