Price optimization with Python (Part 1: Demand forecasting)
There are tons of information about why price optimization is important, but I had a hard time finding a detailed algorithmic description of how to implement it.
This project is about Deliveries prices optimization (or Services that go with sales), but you can use it for any retail area. The main workflow can be divided into 3 large parts. That’s why I decided to break this article into 3 pieces.
The first one gives us an idea of how we will sell if the prices doesn’t change.
The second one is about demand elasticities — I estimate sales volume functions wrt prices. But not only. I also calculate cross-elasticities of demand of Goods depending on Service prices. You can use the same approach for associated items, in case you are to solve the regular retail price optimization problem.
And the third (and the most important) part would be maximization itself. Here we have to implement the profit function (arguments for the function would be all types of costs, goods prices, forecasted As-Is demand, elasticities, and cross-elasticities). Besides, there might be linear and non-linear constraints.
First of all, let’s take a look at the dataset.
Every service has a delivery Zone and Weight Range. I consider every unique combination as a particular Service. Time to visualize them.
ts = filtered_delivery.groupby([
'ServiceID',
'ZoneID',
'WeightRange',
pd.Grouper(key='Date',
freq='W')])[['OrderNbr']].count().unstack().transpose().sort_values('Date').reset_index().drop(columns='level_0').set_index('Date')
ts.columns = [str(a)+'_'+str(b)+'_'+str(c)
for a,b,c in
zip(ts.columns.get_level_values(0),
ts.columns.get_level_values(1),
ts.columns.get_level_values(2),
)]ts.plot(legend=False)
As we can see from the graph, several services were influenced by pandemic much more than others. I’ve used a simple trick to decide, what time series have to be shortened by cutting the pandemic section out — I checked if the number of orders from April to June does not differ significantly from the number of orders for the previous three months. Apparently, more accurate methods exist, e.g. you can forecast weekly sales for the pandemic period and compare prediction with the actual values.
max_fluct = 2compare['pandemic'] = ts[(ts.index>pd.to_datetime('2020-04-01'))&
(ts.index<pd.to_datetime('2020-07-01'))].sum()
compare['quarter_ago'] = ts[(ts.index>pd.to_datetime('2020-01-01'))&
(ts.index<pd.to_datetime('2020-04-01'))].sum()short = compare[(compare['pandemic']>max_fluct*compare['quarter_ago'])|
(compare['pandemic']<1/max_fluct*compare['quarter_ago'])]
long = compare[(compare['pandemic']<max_fluct*compare['quarter_ago'])&
(compare['pandemic']>1/max_fluct*compare['quarter_ago'])]short_ts = ts[ts.index<pd.to_datetime('2020-04-01')][short.index]
long_ts = ts[long.index]
We’re all set for forecasting!
I’ve tried two different approaches to solve the forecasting problem — regression models to predict weekly demand for every type of delivery service and time series. And it is no surprise that the latter worked better, because of the nature of the data and the problem. Finally, I utilized TBASE time series forecasting method and its open-source implementation from
https://github.com/intive-DataScience/tbats
because it is entirely automated (and I had quite a lot of time series with a given level of granularity) and showed the best accuracy on my data (MAPE < 10%).
class forecast:
def __init__(self, X, n=52):
self.X = X
#extend the dataset for n weeks ahead
ix = pd.date_range(start=self.X.index.max() + timedelta(1),
end=self.X.index.max() + timedelta(n*7), freq='W')
self.test_X = pd.DataFrame(index=ix)
def make_predictions(self):
for col in self.X.columns:
y_train = self.X[col].values
#Create estimator
estimator = TBATS(seasonal_periods=[52.14]) #Fit model
fitted_model = estimator.fit(y_train) #Forecast steps ahead
y_forecasted = fitted_model.forecast(steps=self.test_X.shape[0])
self.test_X[col] = y_forecasted
That’s it for the first part. Please, find the Second one here.