Price Optimization Using Snowflake Notebooks and Streamlit in Snowflake
Co-Authors: Marie Coolsaet and Shriya Rai
Tasty Bytes Overview
The fictitious company Tasty Bytes is one of the largest food truck networks in the world, with localized menu options spread throughout 15 food truck brands globally. Tasty Bytes aims to achieve 25% year-over-year sales growth over 5 years. To help attain this goal, they must find a way to maximize sales and enhance customer satisfaction.
In this blog, we’ll delve into price optimization for their diversified food truck brands to inform their pricing and promotions by leveraging Snowflake’s powerful features such as:
- Machine Learning with Snowpark in Snowflake Notebooks: Train and deploy an ML model to understand how menu-item demand changes with varying price
- Streamlit in Snowflake: We will create a user-friendly monthly pricing application using Streamlit to use the deployed ML model to inform pricing strategies
Snowpark in Snowflake boosts machine learning workflows by providing familiar DataFrame syntax for data manipulation, allowing data scientists to use Java, Scala, or Python. The workflow includes a model registry for storing, versioning, and managing machine learning models. This makes it simple to deploy and track models within Snowflake. Snowpark also optimizes compute resources by automatically scaling to handle large data and model training tasks efficiently. The integration of data processing, model management, and compute optimization, all in Snowflake, streamlines the machine learning pipeline.
As a quick overview of the workflow, the truck, menu, and customer data are uploaded to raw tables in Snowflake using COPY INTO statements from an Amazon S3 bucket. Using Snowpark, we then aggregate the data, which is used for feature engineering and further transformation of the data. The aggregate and transformed data are used in model training and deployment, taking place in Snowflake Notebooks. The data is then used in a Streamlit in Snowflake application.
Machine Learning with Snowpark
In the Machine Learning with Snowpark section of this blog, we will train and deploy an ML model which leverages historical menu-item sale data to understand how menu-item demand changes with varying price. By utilizing this trained model, we would recommend the optimal day of week prices for all menu-items for the upcoming month to our food-truck brands.
The steps below represent a high-level overview of the data exploration, feature engineering, model training and deployment, and model utilization of this process.
Data Exploration
We will start exploring our data with a Snowflake Notebook. After installing Snowpark in our Python environment, we import the Snowpark package and other packages needed. Our current Snowflake session is saved in line 31 of the cell below.
We create a Snowpark DataFrame by reading the data in the daily menu-item sale view in our Snowflake account using Snowpark’s table function.
With our Snowpark DataFrame defined, we use the show function and specify the number of rows we want printed to be 5. This value is defaulted to 10 in case the argument is not provided.
We can access the columns object of our Snowpark Dataframe to look at all the columns present in our data.
Feature Engineering
In this section, we will build our feature engineering pipeline, which leverages Snowflake compute to perform aggregation and transformation operations, enabling us to utilize native Snowflake performance and scale.
We create the aggregate table by using the Snowpark group_by function to define group rows by the columns specified and the Snowpark agg function to get the average for our aggregated columns.
After further manipulation of the Snowpark Dataframe using Snowpark functions to push all our complex data transformation onto our Snowflake compute, we use a Snowflake window function to get rolling averages of price and price change columns over time. Window functions allow aggregation on a “moving” set of rows.
Model Training and Deployment
We leverage Snowflake ML API to perform distributed hyperparameter tuning in order to train multiple models in parallel to learn how menu-item demand is influenced by their prices.
The model is evaluated using metrics from Snowpark ML modeling — mean_squared_error, mean_absolute_error, r2_score and tested on our prepped test data.
We then deploy the model to the model registry by first opening the registry. Opening the registry returns a reference to it, which we use to add new models and obtain references to existing models.
Calling the registry.log_model() function logs the model and does the following:
- Serializes the model and creates a Snowflake model object from a Python object
- Adds metadata, such as a description, to the model as specified in the log_model call
Model Utilization
We create a Snowpark stored procedure to utilize the deployed model for demand predictions for all possible prices for each menu-item. For example, some possible prices for a given menu-item range from 50% discount of its base sale price to 20% markup on its base sale price. For a more detailed overview of implementation, head over to the Price Optimization Using Snowflake Notebooks and Streamlit in Snowflake Quickstart.
Data-Driven Insights
Peking Truck food-truck brand has seen lower than average Wednesday sales for Wonton Soup for the past few months. The brand manager proposed introducing a Wednesday Wonton promotion to increase sales. In order to get the best promotion price, Tasty Bytes used a deployed price recommender stored procedure to drive up sales. We investigate by visualizing Wonton Wednesday sales and prices for the last 6 months to understand the correlation between the two.
There is a drop in sales in October 2022, right as there is an increase in price.
After calling the stored procedure to get the best promotion price for Wonton Soup, we see that the recommended price is lower than the base sale price.
When using the recommended low price, Peking Trucks experiences a spike in sales.
Streamlit in Snowflake
In the Streamlit in Snowflake section of this blog, we see how a Streamlit application supports Tasty Bytes’ brand managers in deciding monthly pricing. The application uses the price recommender to get the best prices for menu-items and to inform food-truck brand pricing strategy. At this time, brand managers submit their pricing using spreadsheets. The food truck managers gave us their spreadsheet to import and infer the demand model. This data is used in their Streamlit application, Monthly Pricing App.
In the Monthly Pricing App, we can view the brands and select the item to change the price. For example, the Smoky BBQ brand has the potential for weekly demand to increase by 4.6% and total weekly profit to increase by 2.8% after setting the new price for the Two Meat Plate on Friday to $29 instead of $31.70. The price recommendation is even lower, at $27.90.
Conclusion
In conclusion, Tasty Bytes was able to find a way to maximize its sales without impacting its customer satisfaction. They used machine learning with Snowpark using Snowflake Notebooks to train and deploy an ML model to infer the relationship between menu-item demand changes and varying prices. They also leveraged Streamlit in Snowflake using the deployed ML model to inform food-truck brand pricing strategy.