Snowflake AI Data Cloud

To be successful with your AI Projects you must have a solid Data foundation which Snowflake provides. Snowflake AI Platform is built on Five layers (see below), Storage, Compute (CPU/GPU), AI, Horizon, Discovery, and distribution. Snowflake Data Cloud Platform has now transitioned to Snowflake AI Data Cloud Platform, which brings Enterprise AI capabilities using Cortex AI and Snowflake ML. This happens in collaboration with Nvidia by adopting Nvidia AI Enterprise software to integrate with NeMo Retriever microservices into Snowflake Cortex AI. With this, business users can efficiently build and leverage AI applications. In addition to that Snowflake customers can build native apps powered by Snowpark Container Services, which uses a set of pre-build AI containers, and Nvidia AI Enterprise.

Snowflake AI Data Cloud

Snowflake Cortex AI is a fully managed service offering large language models (LLMs) and Vector Search capabilities. This enables business users to efficiently develop and utilize AI-powered applications. There is a range of LLMs available to suit different use cases, including options from Nvidia, Reka, Mistral, Meta, Gemma, Snowflake Arctic, and more to come. Snowflake Cortex AI has the following components:

Cortex Analyst allow businesses to securely build applications on top of their analytical data. This is the “talk to data” feature that generates the SQL from your data in Snowflake.

Cortex Search harnesses state-of-the-art retrieval and ranking technology so users can build applications against the documents and other text-based datasets with hybrid search as a service (vector and text search)

Cortex Guard is an LLM-based input and output safeguard that filters and flags harmful content (violence, hate, criminal activities, etc) from data and other assets. This makes Snowflake models safe and usable.

Cortex Studio (Private Preview) is a No-Code interactive interface for AI Development and helps productize AI Applications to further democratize models to the enterprise data.

Cortex Fine-Tuning enhances the performance of large language models (LLMs) and provides a more personalized experience. These fine-tuned models can then be used through Cortex AI functions just like the pre-trained models. You can fine-tune LLMs using either Cortex Studio or programmatically through SQL functions.

Snowflake Cortex AI Archtecture

Snowflake ML

Snowflake ML offers you a set of capabilities for end-to-end machine learning. You can use Cortex ML Function or a fully custom ML model. With this Data scientists can easily and securely develop and productionize scalable features and models without any data movement, or governance tradeoff. It offers various features such as MLOps capabilities so users can discover, manage, and govern their features, models, and metadata across the entire ML lifecycle.

Feature Store allows you to create, store, manage, and serve consistent ML features for model training and inference which can be used for continuous, automated refreshes on a batch or streaming data.

Model Registry allows users to govern the access and use of all types of AI models and their metadata for a personalized experience and cost-saving automation.

ML Lineage allows you to trace the usage of features, datasets, and models across the end-to-end ML lifecycle.

ML Modeling for feature engineering and model training with Python framework.

Snowpark Container Services for CPU and GPU processing to train the model using Notebooks.

Snowflake Notebooks provides a cell-by-cell development interface to explore and get insight from the data. No configuration is required, you can collaborate and integrate with Github, full RBAC-based, and offer data security. It supports Python, SQL, and markdown cells.

Snowpark Pandas API is an enterprise-grade distributed pandas that allows you to ease and flexibility with the power of Snowflake compute with the same Panda native experience that allows you to run open-source ML models that only support pandas. All you do is the following:

import modin.pandas as pd
import snowflake.snowpark.modin.plugin

df = pd.DataFrame(
[
[1, "John"],
[2, "Mary"],
]
columns=["ID", "NAME"],
)
df
Snowflake ML Architecture

Polaris Catalog:

It is an open standard and open source (soon), to read and write from any REST-compatible engine, it manages Iceberg tables for engines in one place. It will be available in Snowflake hosted or interoperability with Snowflake including Horizon and another engine. It is open source, so you can host it in your environment too. This is great if you are concerned about lock-in with the catalog!

Snowflake Polaris Catalog (Open Source)

Internal Marketplace: You can be able to browse data products such as data and applications, published by your organization for internal use, as well as third-party products your organization has approved for use.

Snowflake Trail: It allows developers to monitor, troubleshoot, debug, and take actions on pipelines, apps, user code, and compute utilization.

Delta Direct (Private Preview): It allows you to continuously and cost-effectively access your Delta Lake tables as Iceberg tables for “bronze” and “silver” layers, without all of the requirements of Universal Format (UniForm). It is a great way to use the Snowflake Compute engine for accessing data in Delta Table. Similarly, Parquet Direct allows you to use Iceberg without rewriting or duplicating Parquet files — even as new Parquet files arrive.

Private, Public Preview, and GA…GA…GA

There are many product features are announced as a private, public preview, and GA such as Document AI, Document AI Pipeline (Public Preview), Dynamic Table, Serverless Task Flex (Private Preview), Iceberg Table, Universal Search, Internal Marketplace (Private Preview), Native App (GA). Expect many more features to go GA this year.

A couple of things worth mentioning are that several product enhancements have been made that improve query performance and cost optimization without you doing anything. There is an improvement in execution time for SELECT.. ORDER BY..LIMIT.., predicate pushdown with a secure view, queries with non-clustered tables, queries with join on wide build-side rows, SHOW commands. In addition, there is an improvement in the compilation time of queries for extraction from VARIANT types, queries with subqueries, materialized view with many micro partitions, DML statements, and queries with many SQL expressions. Query compilation occurs at the cloud services layer so it is subject to cloud service cost which covers 10% of your warehouse credit. This enhancement will reduce your cost.

And lastly, if you care, you can have Dark mode for Snowsight!

In summary, Snowflake has introduced numerous innovations to handle workloads in a highly cost-effective manner. These advancements are primarily focused on AI and Machine Learning, aiming to develop and distribute applications that can generate new revenue streams for you. One key takeaway is the importance of establishing a solid data strategy before embarking on any AI projects. I had an opportunity to talk to several Snowflake customers and found it very interesting, how they innovate on Snowflake AI Data Cloud. You can do too!

Best of luck with your innovative endeavors.

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

--

--