Snowflake Machine Learning: An Easy Button

While machine learning can be complex, it also offers powerful tools to solve challenging problems and make sense of large and intricate datasets. In order to deploy ML companies need the best practices, processes, and tools used to streamline and manage the lifecycle of machine learning (ML) models. This brings many challenges such as the complexity of the ML models, infrastructure scalability, versioning of Models and Data, Governance, and Regulatory compliance. In addition to that, Analysts, that only know SQL needs a sharp learning curve if they are engaged in data science projects.

Snowflake made this easy for you with the new feature called ML Powered Function. This feature lets you detect patterns in your data with the built-in model provided by Snowflake. You can able to train this ML model on your data, and then provide predictions based on the trends in the data using SQL. You do not need to have a separate complex infrastructure and can able to provide accurate results fast and with scale and security. The ML Powered Functions available today are: Forecasting, Anomaly Detection, and Contribution Explorer, but more features are coming later.

Snowflake Machine Learning Features

Let's see how you can start using it easily. In this example, I am using Electricity Usage data to predict the usage of electricity in my house. I use pge.com data which one can download as a CSV file, only if you live in North California. Here is the hourly data usage file if you do not have one.

  1. Load data into Snowflake: Upload CSV to your internal stage using Snowsight or SnowSQL.
  2. Use Snowflake infer schema to load data into a table
use role sysadmin;
use demodb.ml;
use warehouse demo_wh;

LIST @mldata;

CREATE OR REPLACE FILE FORMAT CSV_FF
TYPE = csv
PARSE_HEADER = true
SKIP_HEADER = 1
error_on_column_count_mismatch=false;

CREATE or REPLACE TABLE pge_data
USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
FROM TABLE(
INFER_SCHEMA(
LOCATION=>'@mldata/pge_elec_data.csv'
, FILE_FORMAT=>'csv_ff'
)
));

DESC TABLE pge_data;

-- load data
COPY INTO pge_data FROM @mldata/pge_elec_data.csv
FILE_FORMAT = (FORMAT_NAME= 'csv_ff') MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE;

3. Train with your dataset and Forecast it

-- train model with your data
CREATE OR REPLACE snowflake.ml.forecast
electric_usage(input_data => SYSTEM$REFERENCE('VIEW', 'pge_data_vw'),
timestamp_colname => 'USAGE_DATE',
target_colname => 'DAILY_USAGE'
);
-- get the predictions for next thirty days
CALL electric_usage!forecast(30);

--put it togater with existing and prediction
SELECT usage_date as ts,
daily_usage as actuals,
null as forecast,
null as lower_bound,
null as upper_bound
FROM pge_data_vw
UNION ALL
SELECT ts, null as actuals, forecast, lower_bound, upper_bound
FROM TABLE(RESULT_SCAN(-1))
ORDER BY 1;

-- change snowsight output to chart and add actuals, and forcast column

-- see which features are auto generated by the forecast algorithm
call electric_usage!explain_feature_importance();
Forecasting output

4. Enhance/improve your forecast with Snowflake marketplace data. Get the weather data from Marketplace, and use temperature and cloud cover to see the impact of solar generation and the usage of electricity. Make sure you create a database from the following marketplace dataset and give the name GLOBAL_WEATHER_DATA to use in forecasting.

Snowflake Marketplace Dataset
-- create view from marketplace data based on my area
create or replace view daily_weather_vw as select
date_valid_std as wdate,
postal_code,
AVG_CLOUD_COVER_TOT_PCT as cloud_cover,
AVG_TEMPERATURE_AIR_2M_F as temperature
from global_Weather_data.standard_tile.history_day
where postal_code = '94506';


-- join pge data with weather data based on date
create or replace view pge_weather_vw as select
pge.*,
w.*
from pge_data_vw as pge left join daily_weather_vw as w
on pge.usage_date = w.wdate;


-- create view to train the model based on past data, including weather
create or replace view pge_weather_data_vw as select
usage_date,
daily_usage,
cloud_cover,
temperature
from pge_weather_vw;


-- create view based on weather forecast data from marketplace
create or replace view weather_forecast_data_vw as select
to_timestamp_ntz(date_valid_std) as daily_timestamp,
AVG_CLOUD_COVER_TOT_PCT as cloud_cover,
AVG_TEMPERATURE_AIR_2M_F as temperature
from global_Weather_data.standard_tile.forecast_day
where postal_code = '94506';

-- check the data
select * from pge_weather_data_vw limit 10;
select * from weather_forecast_data_vw limit 10;

-- Train your model
create or replace snowflake.ml.stateful_forecast
electric_usage_weather(
input_data => SYSTEM$REFERENCE('VIEW', 'pge_weather_data_vw'),
timestamp_colname => 'usage_date',
target_colname => 'daily_usage'
);
-- Get predictions
call electric_usage_weather!forecast
(input_data => SYSTEM$REFERENCE('VIEW', 'weather_forecast_data_vw'),
timestamp_colname => 'daily_timestamp');

-- show output in chart

select usage_date as ts,
daily_usage as actuals,
null as forecast,
null as lower_bound,
null as upper_bound
from pge_weather_data_vw
union all
select ts,
null as actuals,
forecast,
lower_bound,
upper_bound
from table(result_scan(-1));
Forecast Output in Snowsight Charts

In addition to this, if you have an anomaly in your data you can easily detect that using a similar function and exclude that to train the model to get you the most accurate results. There are so many use cases, one use-case is that you can use this to forecast your Snowflake credit usage:

create or replace table snowflake_credit_history as
select TO_TIMESTAMP_NTZ(usage_date)::timestamp as usage_date,
sum(credits_billed)::number credits
from snowflake.account_usage.metering_daily_history
group by all;

--create forecasting
create or replace snowflake.ml.forecast snow_credits
(
INPUT_DATA => SYSTEM$REFERENCE('TABLE','snowflake_credit_history'),
TIMESTAMP_COLNAME => 'usage_date',
TARGET_COLNAME => 'credits'
);

call snow_credits!FORECAST(FORECASTING_PERIODS => 30);

Machine learning has become more accessible and user-friendly on Snowflake platforms using ML-powered functions. Any SQL developer can do ML in Snowflake.

This is not all, in addition to this, Snowflake also allows you to create your own ML model or bring any ML model with a complete ML lifecycle all in Snowflake.

Snowpark ML API

Happy Machine Learning!

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer (Snowflake).

--

--