How to Use Snowpark ML Model Registry in Snowflake

Soonmo Seong
4 min readFeb 5, 2024

--

As the era of Snowpark ML Modeling begins, we developed a linear regression model in a Snowpark-native way. This shows how easy ML modeling is for tables. So, we are getting required to manage several versions of multiple models. Snowpark ML Model Registry gives us handy version control and model deployment. In addition to technical readiness, core ML frameworks are also supported definitely including Snowpark ML Modeling.

We are going to implement following steps to put Model Registry into ML workflow:

  • Creating a session
  • Building a linear regression model
  • Evaluating the model
  • Registering the model to Model Registry
  • Inferencing with the registered model
  • Summary

Creating a session

We are going to stick to Snowpark like my previous linear regression building blog. Let’s create a session as below. getpass.getpass(f”Enter password:”) more protects us from security issues.

import getpass
from snowflake.snowpark.session import Session

from snowflake.ml.modeling.linear_model import LinearRegression
from snowflake.ml.modeling.preprocessing import OneHotEncoder
from snowflake.ml.registry import registry
from snowflake.ml._internal.utils import identifier
import snowflake.snowpark.functions as F

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from snowflake.ml.modeling.metrics import mean_absolute_error

# Session
connection_parameters = {
"account": "xxxxxxxxxxxx",
"user": "xxxxxxxxxxxx",
"role": "ACCOUNTADMIN",
"warehouse": "COMPUTE_WH",
"database": "SNOWPARK_ML_MODELING",
"schema": "PUBLIC"
}

# Establish and configure connection.
connection_parameters["password"] = getpass.getpass(f"Enter password:")
session = Session.builder.configs(connection_parameters).create()

Building a linear regression model

Below code fits a linear regression model with a table used before.

# reaing a table 
df = session.table("NYC_ZIP_INCOME")

# feature engineering
ohe = OneHotEncoder(input_cols = ['ZIP'], output_cols = ['ZIP_OHE'], drop_input_cols=True)
transformed_df = ohe.fit(df).transform(df)
input_columns = transformed_df.columns[:-1]
label_columns = transformed_df.columns[-1]
output_columns = str('PREDICTED_ANNUAL_INCOME')

# model fitting
regr = LinearRegression(
input_cols = input_columns,
label_cols = label_columns,
output_cols = output_columns
)
regr.fit(transformed_df)

We confirm that the fitted model predicts the total annual income that people living in 10001, Zip Code, earns 2025.

# inferencing without model registry
def annual_income_predictor(zip_code, year):

# create snowpark dataframe
input = session.create_dataframe([(str(zip_code), year)], schema = ['ZIP', 'YEAR'])
# one hot encode input
transformed_input = ohe.transform(input)
# predict
prediction = regr.predict(transformed_input)

return prediction.select('PREDICTED_ANNUAL_INCOME').show()

# result = 3040481376.845398
annual_income_predictor(10001, 2025)
transformed_df = regr.predict(transformed_df)
transformed_df.select(F.col(label_columns), F.col(output_columns)).show()

Evaluating the model

Mean absolute error is one of simple and effective metrics that evaluate regression models such as xgboost and random forest even though we are using it for linear model.

mae = mean_absolute_error(df = transformed_df,
y_true_col_names = label_columns,
y_pred_col_names = output_columns)
print(f"Mean Absolute Error: {mae}")

Registering the model to Model Registry

Snowpark model registry is table-shaped so we are familiar and easy to use. And, that has required arguments and optional arguments, which stores environmental dependencies and important memos.

Sample data can be registered with fitted models, better understanding the process of model building and data preparation. Hence, we are going to register sample data. It’s enough to slice the dataframe.

# Get sample input data to pass into the registry logging function
sample_data = transformed_df.select(input_columns).limit(100)

As below, we can set a database, schema, model name, model version and model.

# database and schema
db = identifier._get_unescaped_name(session.get_current_database())
schema = identifier._get_unescaped_name(session.get_current_schema())

# model name
model_name = "ZIPCODE_ANNUAL_INCOME_PREDICTION"

# create a registry and log the model
model_registry = registry.Registry(session=session, database_name=db, schema_name=schema)

# log the fitted model
model_ver = model_registry.log_model(
model_name=model_name,
version_name='V0',
model=regr,
sample_input_data=sample_data
)

Furthermore, evaluation metric and comments can be added. Let’s check whether or not the model we made is registered correctly.

# evaluation metric
model_ver.set_metric(metric_name="mean_abs_err", value=mae)

# comments
model_ver.comment = "This is the first iteration of our ZIPCODE ANNUAL INCOME PREDICTION model."

# check
model_registry.get_model(model_name).show_versions()

This is table-like Model Registry in Snowflake. Even though it became GA recently, it looks friendly because it is seen like pandas dataframe.

Inferencing with the registered model

Model Registry makes some differences in inferencing the model. get_model and run would make life easier.

zip_code = 10001
year = 2025

# create snowpark dataframe
input = session.create_dataframe([(str(zip_code), year)], schema = ['ZIP', 'YEAR'])
# one hot encode input
test_df = ohe.transform(input)

# inference with model registry
model_ver = model_registry.get_model(model_name).version('v0')
result = model_ver.run(test_df, function_name="predict")
result.show()

You can clean up the model registry if you don’t need any more.

# clean up
model_registry.delete_model(model_name)
# see it is empty
model_registry.show_models()

Summary

In this blog, we went over Snowpark Model Registry in Snowflake. This new feature make some different consequences in model deployment and inference. However, it isn’t that difficult to implement given fundamental Python programming skills.

--

--