Price Optimization Using Snowflake Notebooks and Streamlit in Snowflake

Co-Authors: Marie Coolsaet and Shriya Rai

Tasty Bytes Overview

The fictitious company Tasty Bytes is one of the largest food truck networks in the world, with localized menu options spread throughout 15 food truck brands globally. Tasty Bytes aims to achieve 25% year-over-year sales growth over 5 years. To help attain this goal, they must find a way to maximize sales and enhance customer satisfaction.

Figure 1: Tasty Bytes — Price Optimization

In this blog, we’ll delve into price optimization for their diversified food truck brands to inform their pricing and promotions by leveraging Snowflake’s powerful features such as:

Snowpark in Snowflake boosts machine learning workflows by providing familiar DataFrame syntax for data manipulation, allowing data scientists to use Java, Scala, or Python. The workflow includes a model registry for storing, versioning, and managing machine learning models. This makes it simple to deploy and track models within Snowflake. Snowpark also optimizes compute resources by automatically scaling to handle large data and model training tasks efficiently. The integration of data processing, model management, and compute optimization, all in Snowflake, streamlines the machine learning pipeline.

As a quick overview of the workflow, the truck, menu, and customer data are uploaded to raw tables in Snowflake using COPY INTO statements from an Amazon S3 bucket. Using Snowpark, we then aggregate the data, which is used for feature engineering and further transformation of the data. The aggregate and transformed data are used in model training and deployment, taking place in Snowflake Notebooks. The data is then used in a Streamlit in Snowflake application.

Figure 2: Price Optimization Architecture

Machine Learning with Snowpark

In the Machine Learning with Snowpark section of this blog, we will train and deploy an ML model which leverages historical menu-item sale data to understand how menu-item demand changes with varying price. By utilizing this trained model, we would recommend the optimal day of week prices for all menu-items for the upcoming month to our food-truck brands.

The steps below represent a high-level overview of the data exploration, feature engineering, model training and deployment, and model utilization of this process.

Data Exploration

We will start exploring our data with a Snowflake Notebook. After installing Snowpark in our Python environment, we import the Snowpark package and other packages needed. Our current Snowflake session is saved in line 31 of the cell below.

Figure 3: Importing libraries and getting active session

We create a Snowpark DataFrame by reading the data in the daily menu-item sale view in our Snowflake account using Snowpark’s table function.

Figure 4: Initializing daily sales DataFrame

With our Snowpark DataFrame defined, we use the show function and specify the number of rows we want printed to be 5. This value is defaulted to 10 in case the argument is not provided.

Figure 5: Showing 5 rows of data from the DataFrame

We can access the columns object of our Snowpark Dataframe to look at all the columns present in our data.

Figure 6: Dataframe columns

Feature Engineering

In this section, we will build our feature engineering pipeline, which leverages Snowflake compute to perform aggregation and transformation operations, enabling us to utilize native Snowflake performance and scale.

We create the aggregate table by using the Snowpark group_by function to define group rows by the columns specified and the Snowpark agg function to get the average for our aggregated columns.

Figure 7: Sales aggregate table

After further manipulation of the Snowpark Dataframe using Snowpark functions to push all our complex data transformation onto our Snowflake compute, we use a Snowflake window function to get rolling averages of price and price change columns over time. Window functions allow aggregation on a “moving” set of rows.

Figure 8: Snowflake window function: create partitions for rolling averages over varied lags
Figure 9: Checking for columns with missing values, replacing null values with 0
Figure 10: Splitting prepped data into training and testing Snowpark DataFrames
Figure 11: Saving training and test data
Figure 12: Saving the full data

Model Training and Deployment

We leverage Snowflake ML API to perform distributed hyperparameter tuning in order to train multiple models in parallel to learn how menu-item demand is influenced by their prices.

Figure 13: The .fit() function creates a temporary stored procedure in the background enabling us to leverage Snowflake compute to train the models without moving the data outside Snowflake

The model is evaluated using metrics from Snowpark ML modeling — mean_squared_error, mean_absolute_error, r2_score and tested on our prepped test data.

Figure 14: Test data and model metrics

We then deploy the model to the model registry by first opening the registry. Opening the registry returns a reference to it, which we use to add new models and obtain references to existing models.

Figure 15: Showing models logged in the registry

Calling the registry.log_model() function logs the model and does the following:

  • Serializes the model and creates a Snowflake model object from a Python object
  • Adds metadata, such as a description, to the model as specified in the log_model call
Figure 16: Logging the demand estimation model

Model Utilization

We create a Snowpark stored procedure to utilize the deployed model for demand predictions for all possible prices for each menu-item. For example, some possible prices for a given menu-item range from 50% discount of its base sale price to 20% markup on its base sale price. For a more detailed overview of implementation, head over to the Price Optimization Using Snowflake Notebooks and Streamlit in Snowflake Quickstart.

Data-Driven Insights

Peking Truck food-truck brand has seen lower than average Wednesday sales for Wonton Soup for the past few months. The brand manager proposed introducing a Wednesday Wonton promotion to increase sales. In order to get the best promotion price, Tasty Bytes used a deployed price recommender stored procedure to drive up sales. We investigate by visualizing Wonton Wednesday sales and prices for the last 6 months to understand the correlation between the two.

Figure 17: 2022 Wednesday sales for Wonton Soup

There is a drop in sales in October 2022, right as there is an increase in price.

Figure 18: 2022 Wednesday Prices for Wonton Soup

After calling the stored procedure to get the best promotion price for Wonton Soup, we see that the recommended price is lower than the base sale price.

Figure 19: Price recommendation for Wonton Soup

When using the recommended low price, Peking Trucks experiences a spike in sales.

Figure 20: 2022 Wednesday sales for Wonton Soup after price recommendation of $7.17 from the base price of $7.97

Streamlit in Snowflake

In the Streamlit in Snowflake section of this blog, we see how a Streamlit application supports Tasty Bytes’ brand managers in deciding monthly pricing. The application uses the price recommender to get the best prices for menu-items and to inform food-truck brand pricing strategy. At this time, brand managers submit their pricing using spreadsheets. The food truck managers gave us their spreadsheet to import and infer the demand model. This data is used in their Streamlit application, Monthly Pricing App.

In the Monthly Pricing App, we can view the brands and select the item to change the price. For example, the Smoky BBQ brand has the potential for weekly demand to increase by 4.6% and total weekly profit to increase by 2.8% after setting the new price for the Two Meat Plate on Friday to $29 instead of $31.70. The price recommendation is even lower, at $27.90.

Figure 21: Price change for Brand Smoky BBQ for the Two Meat Plate from $31.70 to $29

Conclusion

In conclusion, Tasty Bytes was able to find a way to maximize its sales without impacting its customer satisfaction. They used machine learning with Snowpark using Snowflake Notebooks to train and deploy an ML model to infer the relationship between menu-item demand changes and varying prices. They also leveraged Streamlit in Snowflake using the deployed ML model to inform food-truck brand pricing strategy.

Resources

--

--