Snowflake’s MLPF:A Game-Changer in ML Forecasting

Gowtham Madeshwaran
BI3 Technologies
Published in
6 min readNov 1, 2023

Data and business analysts are crucial for translating data insights into actionable recommendations. They build predictive models to identify trends, patterns, and anomalies. However, model development often requires advanced statistical and machine-learning knowledge, which can be a hurdle. Analysts skilled in SQL may face challenges fine-tuning models without expertise in programming languages like Python, impacting their analytical effectiveness.

For targeted purposes, Snowflake unveils Machine Learning Powered Functions (MLPFs), a suite of SQL functions. These functions seamlessly execute machine learning models. As of August 2023, three MLPFs cater to time-series data and are in Public Preview

  • Forecasting: Users harness the power of forecasting to predict metrics using historical data. This technique serves as a go-to for estimating future sales, projecting demand for specific Stock-Keeping Units (SKUs) or anticipating website visit volumes over time.
  • Anomaly Detection: This technique uses both supervised and unsupervised learning techniques to highlight anomalous numbers. This could be helpful in situations when you wish to spot increases in your cloud spending, spot unusual data in logs, and more.
  • Contribution Explorer: Utilizing Contribution Explorer, users can do a root cause analysis to identify the key contributors to a specific statistic of interest.

Forecasting, driven by historical data, is your key to precise predictions. It’s the go-to strategy for anticipating future sales, understanding SKU demand, and foreseeing website traffic trends.

Please follow the below steps to experience the power of forecasting in Snowflake’s MLPF.

Step 1: Crafting the Efficient Snowpark Warehouse

Whether it’s a Standard or Snowpark-optimized warehouse, they both deliver exceptional performance. But when you’re tackling memory-intensive tasks like ML training through a stored procedure on a single node, Snowpark-optimized warehouses shine as your top choice for excellence.

You can craft a Snowpark-optimized warehouse with ease by executing this query in the snow-sight portal.

CREATE OR REPLACE WAREHOUSE snowpark_optmized_wh WITH
WAREHOUSE_SIZE='MEDIUM'
WAREHOUSE_TYPE='SNOWPARK-OPTIMIZED'
MAX_CONCURRENCY_LEVEL=1;

Create a role named “ANALYST” and grant access on the schema where the data is stored. Here, the data is stored in the path “SNOW_ML_LEARNING.SNOW_ML_SCHEMA.COVID_TRENDS_VIEW”

//CREATE ROLE - ANALYST
CREATE OR REPLACE ROLE analyst;

//To grant access on the Database - Snow_Ml_Learing
GRANT USAGE ON DATABASE SNOW_ML_LEARNING TO ROLE analyst;

//To grant access on the Schema - Snow_Ml_Schema
GRANT USAGE ON SCHEMA SNOW_ML_SCHEMA TO ROLE analyst;

To work with snowflake forecast functions, the source data should have,

  1. A timestamp column
  2. A target value column representing some quantity of interest at each timestamp.

For easy understanding purposes, let’s consider the Covid analysis dataset as the source which was already loaded into the view COVID_TRENDS_VIEW.

This is the mock view of how the data looks like. It has three columns, namely Date, State_Or_Union_Teritory and Confirmed_Cases with the data from 2022–04–01 to 2022–10–25.

Step 2: Creating the First Forecast Model

Snowflake Machine Learning Powered Functions takes care of most of the data science best practices with respect to hyper-parameter tunning and missing data adjustments.

Let's try to create a single-series forecast model. As mentioned earlier, input data should have both a timestamp and a target column. Below is the Query to create a single-series forecast model that predicts COVID cases for future dates.

CREATE OR REPLACE SNOWFLAKE.ML.FORECAST Covid_Case_Prediction (

INPUT_DATA => SYSTEM$QUERY_REFERENCE('SELECT TO_TIMESTAMP_NTZ(DATE) AS

Timestamp_Date,SUM(CONFIRMED_CASES) AS CONFIRMED_CASES FROM COVID_TRENDS_VIEW GROUP BY 1'),

TIMESTAMP_COLNAME => 'Timestamp_Date',

TARGET_COLNAME => 'CONFIRMED_CASES'
);

Sample data contains State_Or_Union_Territories column. To forecast the Covid Cases for each state or Union territory in the dataset, specify it as Series Column.

CREATE OR REPLACE SNOWFLAKE.ML.FORECAST Covid_Case_Prediction (

INPUT_DATA => SYSTEM$QUERY_REFERENCE('SELECT TO_TIMESTAMP_NTZ(DATE) AS

Timestamp_Date,STATE_OR_UNION_TERITORY,CONFIRMED_CASES FROM COVID_TRENDS_VIEW'),

SERIES_COLNAME => 'STATE_OR_UNION_TERITORY',

TIMESTAMP_COLNAME => 'Timestamp_Date',

TARGET_COLNAME => 'CONFIRMED_CASES'
);

Date column’s datatype is converted into timestamp_ntz , in order to use that column in the forecast function

VIEW or TABLE can also be passed as input by using SYSTEM$REFERENCE(‘VIEW,’MY_VIEW’) or SYSTEM$REFERENCE(‘TABLE’,’MY_TABLE’) respectively.

Use SHOW SNOWFLAKE.ML.FORECAST to confirm whether the model got created successfully or not.

Step 3: Formulating and Displaying the Predictions

Let’s use the trained Covid_Case_Prediction model to create predictions for the demand of the next 30 days. It will generate 30 steps forward per series.

CALL Covid_Case_Prediction!FORECAST( FORECASTING_PERIODS => 30);

The resulting output looks like this,

TS: Which represents the Timestamp for the forecast prediction

Forecast: The output/prediction made by the model

Lower/Upper_Bound: Separate columns that specify the prediction interval

Prediction interval can be customized by passing prediction_interval as a part of configuration object that ranges between 0 to 1. Larger value providing wider range between lower and upper bounds.

CALL Covid_Case_Prediction!FORECAST( FORECASTING_PERIODS => 30, config_object=>{'prediction_interval':0.85});

Note: If you pass data with exogenous variables (additional features) to train your ML model, you need to pass future values for those variables while calling the forecast function.

Run the below Query, immediately after the above one, to visualize the predictions for the next 30 days.

(SELECT DATE,SUM(CONFIRMED_CASES) AS CONFIRMED_CASES,NULL AS FORECAST
FROM COVID_TRENDS_VIEW GROUP BY 1)
UNION ALL
(SELECT TS::DATE AS DATE,NULL AS CONFIRMED_CASES,FORECAST AS FORECAST
FROM TABLE(RESULT_SCAN(-1)));

The resulting chart should looks similar to this,

Step 4: Feature Importance

Delving into how various columns or features influence model outcomes is a pivotal facet of the model-building journey. This exploration unveils key drivers, fostering an intuitive grasp of the process. It empowers us to fine-tune our approach by incorporating predictive columns or pruning those with limited value, ultimately steering us toward more precise conclusions.

Utilizing the explain_feature_importance method as illustrated below, the forecasting MLPF enables you to determine the importance of a feature.

CALL Covid_Case_Prediction!explain_feature_importance();

The output of this call is shown below.

We must keep in mind that this strategy aids in determining how external variables affect the anticipated result. You might think about removing a certain external variable from the training dataset if its score is zero or very low.

Conclusion

In summary, the journey of constructing a machine-learning forecast within Snowflake has showcased the platform’s remarkable capabilities and adaptability. Its prowess in handling extensive datasets, coupled with the reliability of its machine-learning algorithms, has empowered us to craft precise and enlightening forecasts. Moreover, Snowflake’s effortless integration with various tools and platforms amplifies its usability and effectiveness. The Machine Learning Powered Functions (MLPF) have further accentuated Snowflake’s potential as an indispensable instrument within the data science arsenal, capable of steering substantial business insights and decision-making.

About Us

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe, and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn: https://www.linkedin.com/company/bi3technologies
Instagram:
https://www.instagram.com/bi3technologies/
Twitter:
https://twitter.com/Bi3Technologies

--

--