Retail Pricing using Optimization

Published in

Analytics Vidhya

10 min readNov 18, 2019

Traditionally marketers relied on intuitions for most of the pricing decisions with no inclination towards customer behavior, market trends, effect of promotions, holidays and how they contribute towards the sensitivity of the products to their price. With time and advent of high computational capabilities which in turn support analysis of huge volumes of data, most businesses are leveraging big data technologies to optimize pricing decisions with the aim to offer a more competitive pricing, ensuring maximum clearance/revenue/margin targets.

Deciding the correct discount % is not a simple task

Optimal pricing or deciding on the optimal discount percentage is a challenging problem for several reasons. One is the complex structure of the pricing model, which often includes multiple variables such as list prices, discounts, and special offers that need to be optimized. Another reason is the complexity of demand and profit forecasting, which makes it difficult to evaluate new pricing strategies accurately. Finally, the selection of the best modeling and optimization techniques adds as a technical challenge to the process.

Described below is a detailed process of a price optimization strategy that will take into account consumer behavior, holidays, competitor pricing, effect of cannibals, effectiveness of promotions for pricing, and most importantly describe how to decide the prices for maximizing a certain metric while simultaneously keeping the other metric at a minimum certain. For example, a retailer might require maximizing the sell through of winter wear during summers while simultaneously targeting margin above a minimum 20%.

I have also included the relevant codes in some of the sections to assist with the implementation.

To accelerate the code for all big data processes, it is always convenient to code as many as possible processes on PySpark. Many PySpark libraries are still being developed and improved upon. However, the libraries for preprocessing (common calculations and aggregations) and general linear modeling are completely developed and can be very well used. Hence we shall do all the initial steps in PySpark and the optimization step in Python since there are no completely tested libraries in PySpark for optimization catering to the requirements of our case yet.

For users who don’t have a Spark Setup or size of the data isn’t a big concern, the same functionality can be implemented in Python with the corresponding changes in syntaxes.

The process

Sanity Check on the data

There are some basic checks that you need to ensure whenever you are dealing with a retail data.

· Data fields like transaction id, store id, cost price, promotion price, out the door price, discount percentage should always be positive

· The time and date fields can be checked to have 12 distinct months, appropriate hours of purchase (0 to 24/12) based on the clock type in the data

There can be other things that can be checked for depending on the data field. While you check the data for sanity, you also get a proper understanding of the unique values in the data. These should be fixed, and the data should be cleaned accordingly before proceeding ahead.

Roll-up

Data in retail is generally available on a date/timestamp level. In order to see the trends in sales and make the data usable for modeling and forecasting, it should be rolled up from date level to day/year-week level. This generally calls for the following two steps to be followed:

· Join the data available in the various tables based on a primary key(which is generally the SKU id or the SKU id+store number).This mainly calls for joining the transaction data table with the category data, store data, inventory data, calendar(holiday) data, promotions data, competitor data and other available relevant data sets

Below is a sample join of the transaction and calendar tables to include the holiday/special events in the Analytical dataset :

df_analytical = join(df_transaction,
            df_calendar,
            on=[('trans_date', 'calendar_date')],
            drop=[df_calendar.FiscalWeekNumber, df_calendar.FiscalYear])

· Once joined the data can be rolled up to the decided level while ensuring that the imputed values are a good representative of the actual values. E.g. for the average price in the day( for online sales generally) or for the week, a weighted average generally gives a better representation than the normal average. Similarly, volume sold should be added while rolling up

df_analytical = df_analytical.groupBy('StoreId',                                        
                                        'SkuId',                                        
                                        'Year',
                                        'WeekNo',
                                        'HolidayName',
                                        'HolidayFlag',                       
                                       ) \
    .agg(F.sum('VolumeSold').alias('total_volume_sold'),
         F.sum('RegularPrice').alias('total_regular_price'),
         F.sum('PromotionPrice').alias('total_promotion_price'),         
         F.sum(F.abs(df_transaction.UnitCost * df_transaction.UnitsSold)).alias('total_volume_cost'),
         F.sum('ShippingCost').alias('total_shipping_cost'),
         F.count('*').alias('total_transaction_count'))

EDA

It involves analyzing the analytical data set/s to summarize the main characteristics. One of the main motives of exploratory data analysis in this process is to decide the level of modeling i.e. at which level of retail hierarchy should the models be developed in terms of the category data ( SKU/brand/category/ section etc.).One way to do it is to check the data sparsity and price range at various hierarchical levels.

· Data Sparsity check: For each of the levels ( SKU, brand, category and so on), calculate the percentage of nulls in the data for each of the models (i.e. nulls in the column/total relevant columns) . This will require rolling up the data to the different levels each time. Finally, the data sparsity can be compared for various levels to decide for which level maximum percentage of the data has a data sparsity in the lower range

· Price Gap Analysis : This refers to the jump in price when the price is rolled up from one level to another. To ensure that relevant information isn’t lost choose the level where the higher gap is for less number of models

for col in cols:    
    col_null_count = df_analytical.select(F.col(col)).filter(F.col(col).isNull()).count()    
    col_distinct = df_analytical.select(F.col(col)).filter(F.col(col).isNotNull()).distinct().count() 
    if dtypes[col] in ['int', 'double']:
        stats = df_analytical.select(F.stddev(col).alias('stddev'),
                          F.mean(col).alias('mean'),
                          F.min(col).alias('min'),
                          F.max(col).alias('max'),
                          F.sum(col).alias('sum')).collect()[0]
        neg_val_count = df_analytical.filter(df_analytical[col] < 0).count()
        fraction_negative = neg_val_count / float(records)
        sum_value = stats.sum
    else:
        stats = None
        neg_val_count = None
        fraction_negative = None
        sum_value = None

Plotted using the EDA statistics at various hierarchy levels

Here is a sample analysis for a user that requires a Basic- Clearance and Instore pricing:

Data Sparsity Check :

For category level, 50% of the models have a data of 79–105% year-weeks i.e. less data sparsity. The number is 28% for category-brand level. We take both of these for price-gap analysis.

Price-Gap Analysis:

For category level there is less data in the higher price-gap range (9% in the 80–100% gap) and more in the less price-gap range (~26%). Hence category level, can be chosen for modeling clearance-in store data. More data in the less price-gap range is desired so that rolling-up doesn’t create much variation in the values and the models are more precise( when the items are similar) .

Cannibalization

Cannibalization is a familiar dynamic. It refers to the reduction in sales (units or dollars) of a company’s existing products due to the introduction of a new product or the existing similar products. One common example can be the way sales of product of brand A are eaten up by a similar priced product of brand B. Cannibalization is a very prominent effect that is often overlooked while modeling for price optimization. From the hypothesis, including at least the price/discount percentage of the top 5 cannibals of each of the products in the model can speak a lot of the effect through the coefficients and push the price in the respective direction during optimization.

How to get the top 5 cannibals for our products?

If we are modeling at the SKU level, for any SKU that is a denim, its possible cannibals can be denims from different brands. Hence if the product hierarchy for SKU1 is

Apparel -> Bottomwear -> BrandA -> SKU1

To get the cannibals for SKU1, we move a level up in the hierarchy i.e. to the brand level and calculate the correlation between the units sold and discount percentage for all possible combinations of SKUs in different brands. For the SKUs to be cannibals, the correlation b/w the discount and units should be negative. The combinations with negative correlation are filtered, top n cannibals are picked based on the absolute of the correlation values and the discount percentage of each cannibal is used as a feature during modeling.

Attached below is a snapshot of the process to get the correlation between the cannibal keys. The number of cannibals to be considered can be filtered accordingly :

df_AD = (df_analytical.groupBy('key_one_above_model', "key_model_level","year_week")
        .agg(sum("tot_otd_price").alias("tot_otd_price"),
             sum("tot_reg_price").alias("tot_reg_price"),
            sum("tot_volume_sold").alias("tot_volume_sold")))                    
unique_cat_for_correlation = (df_analytical                        .select("key_one_above_model").distinct().collect())
unique_cat_for_correlation = (list(map(lambda x: x['key_one_above_model'],unique_cat_for_correlation)))                                   
df1 = df_AD.filter(df_AD['key_one_above_model'].isin (unique_cat_for_correlation))
dummy1 = (df1.withColumnRenamed('key_model_level', 'Cannibals')
          .select('key_one_above_model', 'Cannibals', 'year_week', 'discount_percent'))
dummy2 = (df1
          .select('key_one_above_model', 'key_model_level', 'year_week', 'tot_volume_sold'))
df2 = (dummy2.join(dummy1, (dummy2.key_one_above_model == dummy1.key_one_above_model) &
                   (dummy2.year_week == dummy1.year_week) &
                   (dummy2.key_model_level != dummy1.Cannibals))
       .drop(dummy1.key_one_above_model).drop(dummy1.year_week))
count_df = (df2.groupBy('key_one_above_model', 'key_model_level', 'Cannibals')           .agg(F.count('year_week').alias('Weeks_sold_together')))
keys = (count_df.select('key_model_level').rdd
        .map(lambda x: x['key_model_level']).collect())
cannibal_keys = (count_df.select('Cannibals').rdd
                 .map(lambda x: x['Cannibals']).collect())
product_combinations = list(zip(keys, cannibal_keys))
for cat_id_volume, cat_id_discount in product_combinations:
    print(cat_id_volume.split('_')[:-1] == cat_id_discount.split('_')[:-1],
          cat_id_volume.split('_')[-1],
          cat_id_discount.split('_')[-1])
    df_corr = (df1.filter((df1["Cannibals"] == cat_id_discount) &
                        (df1["key_model_level"] == cat_id_volume)))   
    correlation_data.append({'cat_id_volume': cat_id_volume,
                             'cat_id_discount': cat_id_discount,
         'corr_value': df_corr.stat.corr('discount_percentage', 'tot_volume_sold')})
corr_result = pd.DataFrame(correlation_data,columns = ['cat_id_volume','cat_id_discount','corr_value'])

Clustering

This step can be used two ways 1) to cluster stores with similar product elasticity and customer behaviors together to reduce the number the number of models and tackle the problem of data sparsity. So instead of modeling for each key in a store, each key is modeled in a store-cluster. 2) In case computational capabilities aren’t a hurdle, the cluster numbers can be used to give a direction to the model to learn something about similar stores.

k-means clustering can be used to decide the clusters on the data available.

This thesis can provide a detailed insight into how can the stores be clustered based on consumer behavior and elasticity of the product and used further for pricing leading to higher revenue/profits.

Modeling

A price elasticity model is generally preferred so that the coefficients can be used to design the optimization equations and intuit if the right features are given the right weight.

Price Elasticity gives the percentage change in quantity demanded in response to a one percent change in price. When you regress the log of volume against log of price, the coefficient gives you the slope i.e. change in volume w.r.t unit change in price.

You might have to standardize/ normalize the values before fitting the model and ensure other preprocessing steps to ensure the data follows the assumptions of linear regression.

While trying out models you can start with a general linear regression (OLS) and then try ridge/ lasso models depending upon the need for variable selection.

When you are done with the model selection and tuning, the coefficients from the model give you the values of the equation.

Hence, your end equation for each product will look something like:

Log (volume) = price elasticity * log (price) + β2 * (cannibal1_price) + β3 * (cannibal2_price) + β4 * (holidayflag1) + …. depending upon the variable importance … ………………………………………………………equation 1

Optimization

Optimizing the prices for a goal requires you to mold the above equation accordingly. For example in order to maximize revenue the equation will be:

Revenue = ∑ price * volume

Revenue = ∑ price * (e^ (price elasticity * log (price) + β2 * (cannibal1_price) + β3 * (cannibal2_price) + ∑ βi*variable i)

……………………………………………………………………. from equation 1

The equations for constraints and business rules can be designed in the same way :

Constraint 1 : margin > 30%

margin = 1- (cost/revenue) = 1- (∑(volume* cost)/∑(volume*price)) >/30% ……………………………………………………….. ………constraint 1

The above equation needs to be optimized for maximization.This is where we switch to python with the AD obtained in the previous step.Most of the algorithms in python are for minimization. Adding a ‘-‘ to the equation leads to minimization of the negative i.e. maximization.

Below is a snapshot of how to optimize once the objective function,constraints and bounds have been defined as per the business requirement. The data structure for defining the constraints and bounds differs for each library and it is advised to go through the documentation while coding any technique.

import numpy as np
from scipy.optimize import minimize
minimize(func, cost_price, method='SLSQP', constraints=con, bounds=[(cost_price, MRP) for i in range(len(modeling_AD))])

Using a gradient descent optimization is one of the widely used optimization techniques. Few things to be taken care of while using the algorithm:

· The gradient can be manually specified or left for the algorithm to take care of. It is always better to design and specify it manually

· The tol should be tuned very carefully since it tells the optimizer when to stop i.e. if the gradient changes by an amount less than the tol for consecutive iterations, it assumes the minima is reached and hence returns it.

· The equation should be convex for the optimization to stop at a global minima and not converge at a local minima. In the equation is not convex, basin hopping optimization should be used

· For the optimization to give a significant result, the models and especially the elasticity values should be captured very efficiently. The reason behind this is very well evident from the partial differential of the equation we are optimizing

Once the prices are obtained a comparative analysis can be done to showcase how the revenue is predicted to take a jump from the optimized prices compared to the ones from traditional pricing methods. Also, a call should be taken towards the price revision. In case of in-store pricing effort and labor requirement for replacement of paper price tags calls for a weekly or biweekly refresh. However, in case of online pricing the prices can be changes multiple times in a day. Hence the new data can be appended whenever the refresh is required and the models can be refreshed.

Here is a detailed read on optimization as a technique.

BOGO or 50% off? Which variant is more effective?

The revenue generated from both above promotions will be equivalent for the retailer. However, the customer propensity varies highly when it comes to the two of them. At times we might require choosing which among the two variants is more effective and capitalize the psychology of discounts

This can be done by adding a flag of the promotion applied in the model. This can be used to analyze which promotion is more effective.

Retail is a very wide domain but these techniques will surely get the user close to the target.

Retail Pricing using Optimization

Written by Riya Kulshrestha