ML, sip, repeat: Fueling SQL analysts with coffee and Machine Learning

You know you work at a data company when snacking is a machine learning problem.

On an ordinary day at Snowflake, the aroma of coffee fills the air, computers hum with activity, and the snack bins are brimming. This sunny atmosphere is no accident; it’s the result of ML models the Snowflake Workplace team uses to forecast office attendance. But here’s the twist: those predictions didn’t require an ML engineer or data scientist’s help. Instead, those forecasts are the result of a mountain of attendance data and the efforts of SQL-savvy people familiar with their data.

Note: This example Streamlit app utilizes synthetic data

Machine learning is, arguably, one of the most transformational technologies of our time. But its benefits are often limited to teams with machine learning engineers and data scientists. That is why my team developed Snowflake Cortex ML-Based Functions. We wanted to give SQL analysts the same superpowers as machine learning practitioners.

As someone who used to live in SQL and spent hours understanding the ins and outs of a dataset (what questions it could answer, where its limits were), I am a firm believer that analysts and SQL-savvy professionals are often the ones most intimately familiar with their data and the problems they are trying to solve. They are experts in their domain, well positioned to be high-impact decision makers!

Arming this cohort with Snowflake Cortex ML-Based Functions grants them the ability to perform machine learning as easily as they can filter and aggregate data in SQL. In turn, this can unclog bottlenecks formed by under-resourced data science and ML engineering teams — and ultimately help analysts and their teams operate more independently and efficiently.

We’re already seeing this come to life with Snowflake customers and internal Snowflake teams alike. From predicting office attendance, to predicting sales for grocery retailing and distribution locations across the US to monitoring daily data ingestion for anomalies — business analytics, Data Center of Excellence teams, data engineering and even backlogged data science teams are using these functions to accelerate their adoption of machine learning. They’re also using these functions to improve their ability to make accurate decisions without the typical overhead of building out new machine learning models.

And we’re just getting started! Today, we have two functions nearing general availability (announcement coming soon): Forecasting and Anomaly detection. We also have a new Classification function entering private preview today. (Reach out to your account team if you’re interested in trying this out!)

Ultimately, though, our goal is to create a suite of functions that addresses the core problems analysts solve on a regular basis like forecasting, anomaly detection, classification. And make that suite highly performant. This means continuing to increase memory limits, speed up training and prediction, and improve the robustness of these functions to every flavor of data and problem type analysts face.

We do recognize that Snowflake Cortex ML-Based Functions are part of a larger ecosystem of tools that abstract away much of the complexity of ML. Their uniqueness, therefore, lies not in the algorithms they leverage or the problems they address. Instead, their uniqueness lies in their ease of use, and the fact that they are built on the Snowflake platform. That means you can scale from one to millions of ML-powered insights with the elasticity and near-zero operations of Snowflake’s engine. And it also means you get Snowflake’s consistent data governance across function inputs and outputs. Most of all, these functions have native compatibility with the larger Snowflake ecosystem — including your data.

I personally get excited about how this unlocks a world in which you can run the below, wherever you work with your Snowflake data today, and get forecasts:

SELECT * FROM my_model.forecast(input_data => (select * from feature_data));

Note: Our syntax is evolving! This is an example of how we might simplify our syntax. For current ML function syntax, see our documentation.

Or, even better, a world in which you’re working with Snowflake Copilot to talk to your data — and when you decide to forecast sales or attendance or compute spend, you’re served Snowflake Cortex forecasts that stand on the shoulders of years of ML research and expertise, without the cost of a full ML team hitting your bottom line.

In the long term, large language models may subsume traditional machine learning and our Snowflake Cortex ML-Based Functions. For now, however, these functions stand as a tool for democratizing access to a highly impactful technology — to a group of people who are extremely well positioned to be change makers for their organizations.

Don’t just take my word for it (I’m biased). Try Snowflake Cortex ML-Based Functions out in this quickstart. Or come to one of our Snowflake offices and have a snack. There’s a tiny bit of machine learning magic behind those brimming bins!

--

--