Price Optimization Model

Published in

ILLUMINATION

8 min readNov 18, 2022

By Mayssam Naji; Kukeshajanth Kodeswaran; Amit Borundiya

Problem Statement

Provide an easy-to-use tool for searching optimal price points for desired products and increasing the overall margin for the forthcoming campaigns or season’s sales events. This approach will also serve as the backbone for further optimization of the marketing spend and promotion calendar. It relies on two major components:

Sensitivity Model: a model that can predict the demand for a product for given conditions.
Optimizer: A tool that can utilize the sensitivity model to efficiently search for an optimal price for each product.

Architecture

Figure 1: Price Optimization Tool workflow

Data Collection

Data used are product specification, purchase history, past promotions, special days (specific to the business), and past marketing spending. Along with a few engineered features that account for seasonality.

Promo Bins: To better understand the variation of promos. The bins are designed to group the promotions according to their realistic impact on the business.
Calendar Data: Data related to the US calendar and consumer behavior

Sensitivity Model

The sensitivity model aims to predict the demand for each product given a set of features (conditions). As it can be seen from the diagram before, the problem appears to be a time-series problem at the surface. Hence, we initially formulated the approach as a time-series problem. However, after experimenting with different approaches, gradient boosting on decision trees (Catboost) proved to be the most efficient and effective approach.

Modeling Approach

The advantage of time-series models is their ability to detect trends and seasonality, giving it a good understanding of the continuity in the data. However, our predictions have to be tailored to each product, and each product has a distinguishing set of attributes and a distinct sell-through curve. Therefore, a time-series model would have to be trained on each product’s historical data in parallel and would only be able to make predictions for each product separately. This is a major challenge for e-commerce use cases, as most of the products are available for a specific period and then removed from the market. Hence, this approach will have a sparse training set of a size ranging between 200 to 500 data points (days). These factors made the time-series approach an undesirable formulation for a sensitivity model.

Unlike time-series models, Catboost treated each data point independently, which preserved our dataset and gave flexibility regarding dataset formulation. However, the model did not have a sense of continuity regarding prediction on the product level. Hence, feature engineering played a major role in overcoming this challenge. We designed a diverse set of features that can give the model a good understanding of the performance of each product and the overall website behavior. We separated our features according to the following structure:

Features Importance

There are four major families of features:

Macro-inherent features: a set of features that depend on the day of the prediction, and they describe the status of the website from a pricing perspective. Moreover, they give a glimpse of the consumer purchasing status.
Macro-historical features: describe the performance of the e-commerce business in the previous year.
Micro-inherent features: a set of features that describe each product, its intended consumer, its pricing and discount, and its quality.
Micro-historical features: these are the features that enable the model to understand the sell-through curve for each product. They are features built on the historical sell-through, both long-term (yearly) and short-term (latest few months). Hence, giving the model a long-short memory of the performance.

Therefore, according to this structure, we give the model a good understanding of the products and their sell-through, as well as the overall expected performance of the business.

Finally, to validate the results of the sensitivity model, we measured its performance in predicting the demand for each product in a given time frame. We aggregated all the predicted units sold to monitor its performance on an e-commerce level in general.

Overall Demand Predictions

On a given test period, the model can make predictions around an error margin of around 15%. Furthermore, to measure the generalizability of the model, we slice various time segments from our dataset and train-test the model.‍

Figure 5: Model predicts vs. real data over the inference period

Figure 6: Sensitivity model’s error distribution over various test segments

Product Level Analysis

We investigate the model accuracy when making predictions for each product for the time segment. Below is an example of the predicted demand for a single product ‍

Figure 7: Sensitivity model’s predictions for a single product

Figure 8: Distribution of error percentages vs. product count

Moreover, when we investigate the distribution of the percent error for each product over the selected time segment, we can see that the model can make stable predictions for most of the products. After examining products with high error margins, we noticed that these products are outliers.

‍Price Elasticity

Aside from its applicability in price optimization, the sensitivity model was used to cluster the products into different price elasticities. The sensitivity model was given a set of products along with the rest of the features to simulate demand for each product. Then, the pricing of these products gradually decreased to generate different demand simulations.‍

Figure 9: Simulated impact of 10 USD price reduction for different products

Hence, another use case of the sensitivity model is price elasticity clustering. The product pricing does not have a large set of unique pricing points in the historical data. This is due to constraints on business in terms of pricing. Therefore, the sensitivity model offers a solution to this problem by generating reliable demand simulations at different price points. We monitor the simulated change in demand with respect to price change and were able to identify different elasticity clusters, as shown below:

Figure 10: Price elasticity according to price change

Optimizer

As mentioned in the problem statement, there are two major objectives ( though additional can be added) demand and margin. Hence the formulation was made as a multiobjective optimization problem. Multi-objective optimization helps the business to make optimal decisions in the presence of trade-offs between two or more conflicting objectives.

In the present scenario, we needed to maximize the margin along with maximizing the sell-through. Various optimization strategies are used namely Grid Search, Random Search and Bayesian Search. Due to the large search space, the Bayesian optimizer is the right choice to get results in finite time and with fewer computation expenses. The basic idea is not to be completely random in choice in the search space but instead use the information from the prior runs to choose better points in the search space.

The present state-of-the-art optimization algorithm is the Tree-Structured Parzen Estimator (TPE). TPE is an iterative process that uses a prioris of evaluated search space to create a probabilistic model, which is used to suggest the next set of points in the search space. A few advantages of TPE are the following:

TPE supports various variables in parameter search space, e.g., uniform, log-uniform, quantized log-uniform, normally-distributed real value, and categorical.
Extremely computationally efficient than conventional methods.

We choose Optuna as our choice of framework for optimization mainly because it provides Multi-Objective TPE and prunes unpromising trials for faster results. Both of these features helped in reaching our own objectives of providing faster and computationally cheaper solutions to large search space problems. During the development of the algorithms, we had several challenges in terms of time complexities and search space complexities.

Search Space Time Complexity Reduction

There are around 2000 products, and for markdown, we need to optimize for around 200 products. On average, for each product, our search space is the price between the upper bound and lower bound price, usually, it is 20 to 30 dollars. Our initial approach was to consider the cannibalization of other products due to the change in the price of other similar products.

For this, we need to create a search space that tries to search for the best combination of prices of products. If we were to do a brute force search for 200 products, even by a 1 dollar decrement, we would have needed to search 200²⁰ points in space (assuming 20 dollars range).

Using MOTPE, we were able to search the space more efficiently, still, to get a better result, we needed 8 to 9 hours of optimization in ml.C5.9xlarge instance. To reduce the time consumed for the search, we created an initial warm-up run for a lower quantity of products ( 5 products per search), we tried to run this search for fixed iterations (500). For 200 products, we would run this 40 times, and the time it consumes to run this process is 37 minutes. (55 seconds per run)

By doing this, we could find better initialization price points for each product. We take two prices from this run, one that yields a high margin and the other that yields high demand. After this, we initialize the optimizer with these two prices and the upper bound price. When we ran the optimizer using this setup, we were able to get the same performance/ better than the initial approach in 2 hours (overall). In the graph on the left side, the search space is very scattered, while with the new approach, the search space is very directed.

STREAMLIT Application:

Figure 12: Final Price Optimization Tool’s interface

Conclusion

Therefore by breaking down the problem into a sensitivity model and an optimizer, we were able to develop a tool that sets an optimal price point for each product. The tool is easy to use for the business team and utilizes state of the art algorithms to make its predictions. Moreover, the presented formulation of the problem makes use of historical data and takes into consideration external factors that have an impact on the business. Finally, the current approach can be utilized for use cases other than only price optimization, such as marketing expenditure optimization and price elasticity analysis.