End-to-End ML on Snowflake

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readDec 4, 2023

At Snowflake Summit 2023, a plethora of new features were launched for data professionals. The significant announcements were from the Snowpark side, like Snowpark Container Services, Snowpark ML, and much more. Now, with the launch of Snowpark ML, let’s discuss integrating the Snowpark ML API with Fosfor Decision Cloud’s Insight Designer and how to do that with an example use case of Nurse Attrition (HR Analytics).

What is Snowpark ML

Snowpark ML includes the Python library and underlying infrastructure for end-to-end ML workflows in Snowflake, including Snowpark ML Modeling and Snowpark ML Operations. Snowpark ML unifies data pre-processing, feature engineering, model training and integrated deployment into a single, easy-to-use Python library.

Snowpark ML consists of two primary components:

Snowpark ML Modeling
Snowpark ML Operations

Snowpark ML Modeling: Feature engineering and model training

With the Snowpark ML Modeling API, you can efficiently use popular ML frameworks such as Scikit-learn and XGBoost for feature engineering and model training without moving data out of Snowflake. This seamless integration ensures that you can work with APIs similar to those you are already familiar with, such as scikit-learn, xgboost, and LightGBM, to develop models intuitively in Snowflake.

Furthermore, by keeping your entire ML model within Snowflake’s security and governance perimeter, you can confidently work with sensitive data while adhering to compliance and regulatory requirements. Snowpark ML Modeling API allows you to harness the exceptional performance and scalability of Snowflake’s virtual warehouses, enabling you to handle large datasets and complex machine-learning tasks easily. The cohesive combination of these features empowers data scientists and ML practitioners to leverage Snowflake’s robust infrastructure while maintaining data integrity and security throughout the entire machine-learning process.

Source : Snowflake event session on Snowpark ML

Snowpark ML Operations: Model Management

For Snowpark ML Operations, the Snowpark Model Registry enables scalable, secure deployment and management of models in Snowflake. The Snowpark Model Registry lets users register, version, and manage machine learning models seamlessly. This includes expanded support for deploying deep learning models from Tensorflow and Pytorch and open-source LLMs from Hugging Face to Snowpark Container Services (which includes GPU compute pools). . This registry is a centralized hub for models, allowing for easy versioning, metadata inclusion, and efficient deployment into production environments. The Snowpark Model Registry now builds on a native Snowflake model entity with built-in versioning support, role-based access control and more streamlined management catering to both SQL and Python users. Snowpark ML’s focus on operational efficiency extends to every stage of the machine learning lifecycle, empowering MLOps engineers to efficiently manage models and confidently deploy them into warehouse and containerized environments.

Snowpark ML Ops (Source : Snowflake quickstart guide)

Snowpark ML on Fosfor Decision Cloud’s(FDC) Insight Designer

FDC’s Insight Designer is cutting-edge DSML (Data Science and Machine Learning) platform, which is designed to seamlessly bring together various data personas, including data scientists, ML engineers, model quality controllers, MLOps engineers, and model governance officers, fostering collaborative excellence in any AI use case. At the forefront of innovation, it accelerates every stage of the ML lifecycle — from efficient data preparation to robust model development, seamless deployment, scoring, and comprehensive monitoring, all powered by the underlying infrastructure of Snowflake data. The platform’s intuitive interface ensures a user-friendly experience, offering a unified and dynamic environment for driving impactful insights and outcomes in the ever-evolving landscape of data science and machine learning.

FDC Insight Designer seamlessly integrates Snowpark ML, offering data scientists a tailored environment for advanced analytics experiments and model development. Leveraging the power of Snowflake’s Data Cloud and compute layer, Snowpark ML within the Insight Designer allows for efficient data transformation and machine learning model training without exporting data. This integration is fortified by the pre-built Snowpark ML templates in the Insight Designer, providing data scientists with ready-to-use Jupyter notebooks equipped with all Snowflake and Snowpark ML dependencies, meaning data scientists can effortlessly launch notebooks, import Snowpark ML APIs, preprocess data, train models, and deploy them directly within Snowflake within the FDC ecosystem. On top of it, the Insight Designer provides ML monitoring capabilities that help monitor the models deployed in Snowflake. The synergy between Snowpark ML and FDS Insight Designer significantly simplifies the machine learning lifecycle, empowering data scientists to focus on innovation and insights without the logistical complexities.

Let’s understand how to actually do it using the FDC Insight Designer:

Launch the Snowpark ML template:

The Insight Designer features ready-made templates, each including an IDE like Jupyter Notebook or Visual Studio Code, a selected programming language with necessary dependencies (including those for Snowpark ML), and a choice of computing resources, such as CPU or GPU, with various size options. Upon launching a template, the IDE opens in a new container with the specified configurations, allowing users to start coding without requiring manual setup.

2. Preprocessing and Training using Snowpark ML

Exploring the integration of Snowpark ML on the Insight Designer, let’s delve into the practical aspects of preprocessing and training machine learning models. The process begins with the seamless importation of Snowpark ML and creation of a robust pipeline, emphasizing key preprocessing steps. The pipeline contains all the preprocessing steps and the definition and initialization of an XGBoost model. The conventional dot fit function can trigger the essential model training phase. Still, it creates a stored procedure for training the model on the data mentioned and runs inside the user’s Snowflake account. After completing the training, it returns the model or pipeline object to Insight Designer.

3. Registering, Deploying and Scoring the ML Model

Once the model is ready, it is taken into production. Let’s now go through the operational aspects of Snowpark ML. The process begins with creating a model registry in Snowflake, where model artifacts are registered along with pertinent details. Notably, the deployment process is highlighted, showcasing the direct deployment of a chosen model from the registry. An important feature to underscore is that Snowpark ML triggers automatically creating a Python User-Defined Function (UDF) within Snowflake during the deployment function. After this, the model can be scored using the dot predict function on the model. This comprehensive exploration demystifies the intricacies of model registration, deployment, and scoring while offering users an efficient and accessible workflow for managing machine learning models.

4. Monitoring Snowpark ML Models

After the model is operationalized, the top priority becomes to monitor the model as it gets integrated with the business. FDC’s Insight Designer takes machine learning monitoring to the next level by seamlessly integrating with Snowpark ML models, ensuring continuous vigilance over various aspects of model performance. This monitoring capability goes beyond mere model deployment and extends to comprehensive tracking of data quality, performance drift, feature or data drift, label drift, prediction drift, and more.

Through constant surveillance, it enables proactive detection of deviations in these critical metrics, allowing data scientists and ML practitioners to address potential issues promptly. The platform’s monitoring prowess ensures that models remain aligned with evolving data patterns and objectives, enhancing overall model robustness and reliability. In essence, the Insight Designer’s ML monitoring feature, in conjunction with Snowpark ML models, offers a comprehensive solution for maintaining the health and integrity of machine learning systems throughout their operational lifecycle.

Conclusion

In summary, when Snowpark ML teams up with Fosfor Decision Cloud’s Insight Designer, it’s like a powerful duo for tech enthusiasts in data science. The Insight Designer makes the machine learning process more accessible — from getting data ready to training models and deploying them in Snowflake. Plus, it doesn’t stop there. It works hand in hand with Snowflake ML Ops, simplifying how models are deployed and keeping an eye on important stuff like data quality, production features, and predictions. This collaboration ensures everything runs smoothly and securely, making it a go-to solution for tech-savvy folks who want a hassle-free and reliable journey in machine learning.

Click here to learn more about how the combination of Fosfor Decision Cloud’s Insight Designer and Snowflake can help you get more value from your data with less effort.