Snowflake Summit 2023: Summary of New Features Announced

This year at Snowflake Summit was bigger than ever and had announced several new features and delivered again on strategy for breaking the silos.

To give a little bit of background, Snowflake started by breaking the departmental (HR, Finance, etc), data (structured and semi-structured), and technology silos, building on that, Snowflake architecture opened up new ways for customers to collaborate (Data Sharing/Marketplace) cross-cloud cross-region with their customers and partners, and that started breaking down business silos. From there, added programmatic functionality to break down development silos, which allows developers to run various programming languages on a single engine, by bringing applications to data. And now a mission to break the application silos as a part of features announce in Summit.

The Snowflake Data Cloud Platform is a single platform that eliminates data silos, simplifies data architectures, and enables governance across all workloads. So now it opens the scope of all data (structured, semi-structured, and unstructured), all users (Analytics, Developers, ML Engineers, Data Engineers, Scientists), all popular languages (SQL, Python, Scala, Java), and all workloads (AI/ML, Applications, Cyber Security, Data Engineering, Data Warehouse, Data Lake, Data Lakehouse, and Transactional). And all with state-of-the-art Security and Governance in a managed platform. It just became a very powerful platform that can be used by everyone.

Snowflake Data Platform

Secondly, Snowflake Data Cloud allows you to build, run, distribute, and monetize applications. Snowflake has a strong, active, and growing developer community across the world. And there are several features announced that help developers build applications in the language of their choice. So, all developers can take advantage of it. Snowflake has a strong and active developer community across the globe.

Third, you can do Machine Learning in Snowflake. You can build powerful ML models at scale, Manage ML data, and model pipelines and deployments in Snowflake. Snowflake also makes it easier for Analysts and Business users, to get more insight using ML power functions.

Here is the main new feature Snowflake announced during the Summit:

1. Snowpark Container Services

Snowpark Container Service allows Snowflake compute infrastructure to run any workloads, including full-stack applications, the secure hosting of LLMs, robust model training, and more, securely within Snowflake. Its developers register, deploy, and run containers & services in Snowflake-managed infrastructure. Container images built by developers using their tools of choice can include code in any programming language (e.g. Python, R, C/C++, React, any framework, etc) and can be executed using configurable hardware options including GPUs with the partnership with Nvidia. Snowpark expands support for more efficient machine learning development and execution. You can bring most of the containers that are running in any platform into Snowflake, as a lift-and-shift. The idea behind providing you with Snowpark Container Services is you can build anything you like and bring code near data.

Snowpark Container Service in Snowpark

2. Snowflake Native Application Framework

It will help the developer create, distribute/share, and monetize applications. As a provider or a developer, you can write apps using Snowflake features such as Snowpark, Streamlit, Container Services, Snowpipe Streaming, etc., and then distribute them to Snowflake’s customer base using Snowflake Marketplace, privately or publicly. You can monetize with your own charging models, such as a monthly subscription, consumption-based mode, one-time fees, etc model. Consumers then, can able to install it into their Snowflake account using their own Snowflake credit in a secure way, so that consumer does not have to move data, nor developer (or a provider) have to manage the infrastructure. Very neat! Some of the use cases of building an app such as Data curation and Enrichment, Advance analytics, Connectors (loading data from various sources to Snowflake), Cost and Governance, Data Clean room, etc. There are already dozens of Native apps published in Snowflake Marketplace from Capital One, Matallion, LiveRamp, etc.

Snowflake Native Application

With the use of Snowpark Container Services, you can bring your existing container to run in Snowflake, see below:

Snowpark Container Service with Native Apps

3. Machine Learning, LLM, and Generative AI

With various Machine Learning features on Snowflake, you can get value out of your data more easily and accelerate time to production using simplified end-to-end ML workflow. you can move ML models to production more easily and ultimately get more value out of your data using ML.

Model Development: Data Scientists can now create an ML model using Snowflake.ml API and familiar API and framework for feature engineering and training in Snowpark. Using Snowpark ML API you can bring ML to the data. Or Simply you can bring their own existing model to run in Snowflake.

Snowflake Model Deployment Flow

ML Ops: Today, the majority of Snowflake Customers train models on Snowflake Data using third-party solutions. Snowflake is committed to providing a complete ML workflow, with seamlessly integrated model management, deployment, and observability. Many customers get LLM models from various providers and adapt them to their use case by fine-tuning them with data from Snowflake. So now customers can deploy those models in Snowflake for inference, then can do so with Snowflake Model Registry (Private Preview). You can add/drop model versions, and tag using the metadata model. Snowflake will also provide Feature Store for simplifying feature management, later.

ML-Powered Functions: Snowflake has SQL Functions powered by machine learning models in Preview. These functions are intended for analysts looking to make decisions with more accurate predictions, without building a full ML Pipeline. These functions are used for Forecasting, Anomaly Detection, Contribution Explorer, and many more coming later. This function can be invoked from SQL directly. They are easy to use, robust, have fast insight delivery, have no complex infrastructure, and are highly scalable.

LLM: There are three ways you can use LLM in Snowflake:

  1. Today, You can use Third-Part LLMs on Snowflake Data using Streamlit. There is a very good example: Snowchat.
  2. Later, You can bring Open Source LLMs and deploy them using Snowpark Containers Services and UI using Streamlit.
  3. Snowflake Managed (Built-in) LLMs: Snowflake announce Document AI which allows to easily extract content/data from documents. You can see an overview of the demo here
  4. Snowflake Partners can take advantage of providing their LLM using Snowflake Container Services. One of Snowflake Partners SAS Viya has done exactly that, you can see a demo here
LLM in Snowflake

4. Iceberg Table

Snowflakes have now a single table type with the option to specify catalog implementation (Snowflake managed/customer-owned e.g. AWS Glue) for transaction metadata while storing data externally (S3, ADLS2, GCS) in an open format based on Iceberg Specification. This will give you fast performance for both managed and unmanaged Iceberg data. Iceberg table is a first-class table means you can take full advantage of Snowflake Platform for governance, performance, marketplace/data sharing, etc. It also has interoperability with Iceberg ecosystems.

Snowflake Iceberg Table

5. Streamlit in Snowflake

Streamlit is an open-source Python library that allows you to create interactive web applications for data science and machine learning projects. It simplifies the process of building and deploying custom web interfaces by providing a user-friendly framework.

With this new feature, you can run your Streamlit code in Snowflake. You can deploy Streamlit code to Snowflake’s secure and reliable infrastructure in one click using existing governance. You can also package Streamlit into a Snowflake Native Application for external distribution.

Several other features were announced, a few of them are here:

GIT Integration: Native git integration to view, run, edit, and collaborate within Snowflake code that exists in git repos. Delivers seamless version control, CI/CD workflows, and better testing controls for pipelines, ML models, and applications

Snowflake Performance Index (SPI): Snowflake now tracks the performance improvement of your workload over time.

Top-K Pruning Queries: Enable you to only retrieve the most relevant answers from a large result set by rank. Additional pruning features, help reduce the need to scan across entire data sets, thereby enabling faster searches. (SELECT ..FROM ..TABLE ORDER BY ABC LIMIT 10)

Budgets: A Budget defines a spending limit for a specific time interval on the compute costs for a group of Snowflake objects. This allows you to monitor Warehouse as well as Serverless usage such as auto clustering, snowpipe, etc.)

Warehouse Utilization: A single metric that gives customers visibility into actual warehouse utilization and can show idle capacity. This will help you better estimate the capacity and size of warehouses.

Geospatial Features: Geometry Data Type, switch spatial system using ST_Transformation, Invalid shape detection, many new functions for Geometry and Geography

There are many more to come in the next couple of months. Snowflake never stop innovating and listens to customers, understands their needs and problems, focuses on what matters to them, and delivers products that customers love.

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

--

--