No Data Science Team? No Problem

Kenny Nagano
5 min readOct 3, 2023

--

Snowflake ML-Powered Anomaly Detection and Contribution Explorer (Part 2)

Intro

ML powered functions is the easy button to leverage machine learning powered functions without having a data science team. Snowflake has introduced 3 ML functions Time Series Forecasting , Anomaly Detection and data contribution. This is part two of the series talking about this public preview feature in snowflake. Part one discusses time series forcast in detail. We will talk about the anomaly detection and data contribution in this part. If you would like to read about time series forcasting here is a link to part one.

https://medium.com/@kenny.nagano/no-data-science-team-no-problem-6794e1293478

Machine learning (ML) is a powerful tool for gaining insights from data, but it can be daunting to get started, especially if you don’t have a team of data scientists. Snowflake makes it easy to use ML without any prior experience, with its ML-powered functions.

Two of these functions that Snowflake provides out of the box are anomaly detection and contribution explorer. Anomaly detection can be used to identify outliers in your data, which could be indicative of fraud, errors, or other problems. Contribution explorer can be used to find the dimensions and values that have the greatest impact on a given metric, which can help you understand your data better and make better decisions.

Anomaly Detection

Anomaly detection works by training a machine learning model on your historical data. The model then learns to identify the typical patterns in your data. Once the model is trained, you can use it to identify new data points that fall outside of those patterns, which could be anomalies.

To use anomaly detection in Snowflake, you first need to create a model. You can do this using the `CREATE MODEL` statement. The `CREATE MODEL` statement takes a number of parameters, including the name of the model, the type of model, and the training data.

Once you have created a model, you can use it to detect anomalies in your data using the `DETECT_ANOMALIES` function. The `DETECT_ANOMALIES` function takes two parameters: the name of the model and the table containing the data you want to analyze. The function returns a table containing the anomaly scores for each row in the input table.

Anomaly scores are numbers between 0 and 1, with higher scores indicating a greater likelihood that the row is an anomaly. You can use a threshold to determine which rows are considered anomalies. For example, you might decide to flag any rows with an anomaly score greater than 0.9.

Training, Using, Viewing, Deleting, and Updating Models

Use CREATE SNOWFLAKE.ML.ANOMALY_DETECTION to create and train a model. The model is trained on the dataset you provide.

CREATE SNOWFLAKE.ML.ANOMALY_DETECTION mydetector(...);

See ANOMALY_DETECTION for complete details about the SNOWFLAKE.ML.ANOMALY_DETECTION constructor. For examples of creating a model, see Detecting Anomalies.

SNOWFLAKE.ML.ANOMALY_DETECTION runs using limited privileges, so by default it does not have access to your data. You must therefore pass tables and views as references, which pass along the caller’s privileges. You can also provide a query reference instead of a reference to a table or a view.

To detect anomalies, call the model’s <model_name>!DETECT_ANOMALIES method:

CALL mymodel!DETECT_ANOMALIES(...)

To view a list of your models

SHOW SNOWFLAKE.ML.ANOMALY_DETECTION;

To remove a model

DROP SNOWFLAKE.ML.ANOMALY_DETECTION <model_name>;

To update a model, delete it and train a new one. Models are immutable and cannot be updated in place.

Contribution Explorer

Contribution explorer works by training a machine learning model on your historical data. The model then learns the relationships between the different dimensions and values in your data and the target metric. Once the model is trained, you can use it to identify the dimensions and values that have the greatest impact on the target metric.

To use contribution explorer in Snowflake, you first need to create a model. You can do this using the `CREATE MODEL` statement. The `CREATE MODEL` statement takes a number of parameters, including the name of the model, the type of model, the training data, and the target metric.

Once you have created a model, you can use it to explore the contribution of different dimensions and values to the target metric using the `EXPLORE_CONTRIBUTIONS` function. The `EXPLORE_CONTRIBUTIONS` function takes three parameters: the name of the model, the table containing the data you want to analyze, and the name of the target metric. The function returns a table containing the contribution scores for each dimension and value in the input table.

Contribution scores are numbers between 0 and 1, with higher scores indicating a greater contribution to the target metric. You can use these scores to identify the dimensions and values that have the biggest impact on your data, which can help you understand your data better and make better decisions.

Examples

Anomaly detection

  • A bank can use anomaly detection to identify fraudulent transactions.
  • A retail company can use anomaly detection to identify products that are running out of stock.
  • A manufacturing company can use anomaly detection to identify machines that are about to fail.

Contribution explorer

  • A marketing company can use contribution explorer to identify the marketing campaigns that are driving the most sales.
  • A product company can use contribution explorer to identify the features that are most important to their customers.
  • A healthcare company can use contribution explorer to identify the factors that are most likely to lead to patient churn.

Getting Started

If you are interested in getting started with Snowflake’s ML functions, the best place to start is the Snowflake documentation. The documentation provides detailed information on how to use each function, as well as examples.

Here are a few additional tips for getting started:

  • Start with a small dataset. This will help you learn how to use the functions and get a feel for the results.
  • Use a variety of data types. The ML functions can be used with a variety of data types, including numerical, categorical, and temporal data.
  • Experiment with different parameters.

--

--