Using Sklearn To Measure The Price Elasticity of Demand for Beef

Alex FitzGerald
CodeX
Published in
6 min readJul 6, 2022

With high inflation and a potential recession looming, businesses have to grapple with tough decisions about how to price their products. The price elasticity of demand (PED) is a valuable measurement to consider when re-pricing a product because it anticipates the demand response to some change in price.

Sklearn’s machine learning library provides data scientists with easy-to-use tools for evaluating PED. In this blog, I will walk through how to build a simple linear regression model to evaluate PED for beef using Python’s Sklearn library. After training our model, we’ll evaluate its performance and discuss what the metrics mean in terms of PED and how we can use the information to make better pricing decisions. I used beef for my analysis but the process and formulas are generally applicable to any product.

Price Elasticity of Demand (PED)

PED is a measure of how sensitive a given product’s demand is to the price of that product. Price and demand are well suited for linear regression because, for most products, price and demand share a linear relationship where increases in price cause a reduction in demand. The slope of this linear relationship is not the same for all products, hence the desire to measure the degree to which a certain product’s demand is tied to its price.

https://learnbusinessconcepts.com/wp-content/uploads/Types-of-Price-Elasticity-of-Demand-LearnBusinessConcepts.com-.png

On one extreme of the PED are products with inelastic demand like utilities, prescription drugs, and tobacco products. These are products that are necessities or addictions that must be bought regardless of their price. On the other extreme are highly elastic products like cars and washing machines where customers can defer purchases if prices rise.

We’ll be working with beef prices that have elastic demand somewhere between these extremes. Beef data is found here. Our task is to build a linear regression model that describes the PED of beef and uses it to predict demand based on price.

Importing Relevant Sklearn Packages

Begin by importing the Sklearn packages you’ll need for training and evaluating our linear regression model into your notebook. Sklearn makes these classes available so you can automate much of the effort involved in linear regression analysis.

#1. Import the model_selection packages from sklearn used to split our training and testing datafrom sklearn.model_selection import train_test_split#2. Import the linear_selection packages from sklearn used to split our training and testing datafrom sklearn.linear_model import LinearRegression#3. Import the metrics packages from sklearn to evaluate the resultsfrom sklearn.metrics import mean_squared_error, r2_score

Select Dependent and Independent Variables

Remember the concept of PED is that demand fluctuates to some degree in response to price. From this formulation, we derive our dependent variable (demand) and our independent variable (price). Access these demand and price series in our Pandas data frame and save them off to variables to be used later.

#Seperate our Pandas data frame into a dependant and independant variables#Independant variable
X = beef_df[['Price']]
#Dependant variable
y = beef_df['Quantity']

Splitting Testing & Training Data

Linear regression models should be evaluated on their ability to predict outcomes with new and unseen data. Evaluating this way replicates the scenario we’d find ourselves in as the manager of a business who wants to predict sales performance at some new and previously untested price. For this reason, it’s critical that we hold out some of our data during training in a test set that we can later use to evaluate the model’s performance.

#Perform the train and test split on our dependant and independant variablesX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Train Our Model & Make Predictions

Calling Sklearn’s LinearRegression class we will instantiate and then fit our model to the training data we created in the previous step.

Using the trained model, we’ll then make predictions about demand using unseen data in the test split. The result will be our predicted demand values.

#Instantiate our linear regression
lr = LinearRegression()
#Fit the model to our training data
lr.fit(X_train,y_train)
#Make predictions
train_predictions = lr.predict(X_train)
test_predictions = lr.predict(X_test)

Interpreting Results

We’ll be looking for the following indicators of our model’s performance and measurements of the resulting PED line.

  1. R-squared close to 1 shows our model is accurate and explains much of the change in demand using price as our input
  2. Comparing R-squared values of our train and test split gives us an indication of variance, how precise our model is between seen and unseen data
  3. Root mean square error (RMSE) indicates the average error for our predictions in units of demand
  4. The coefficient describes the change in demand as a result of one unit change in price
#Import evaluation packages
from sklearn.metrics import mean_squared_error, r2_score
print(f' Train RMSE {round(mean_squared_error(y_train,train_predictions,squared=False),3)}')
print(f' Train R2 {round(r2_score(y_train,train_predictions),3)}')
print()
print(f' Train RMSE {round(mean_squared_error(y_test,test_predictions,squared=False),3)}')
print(f' Test R2 {round(r2_score(y_test,test_predictions),3)}')
print()
print(f'Price coefficient {round(lr.coef_[0],3)}')[OUT]
[OUT]
Train RMSE 0.548
Train R2 0.909

Train RMSE 0.611
Test R2 0.882

Price coefficient -0.047
  • Our R-squared value is close to 1 for both train and test sets indicating our model is both accurate and precise
  • RMSE tells us that our model’s predictions are off by 0.55–0.61 units of demand on average
  • The coefficient says that for each unit increase in price, demand for beef drops by 0.047 all other factors held equal

Now, we’ll plot our prediction line to visualize the PED for beef. The linear relationship is clear. We see that our PED line best resembles example #4 from the graphic above, “elastic demand”.

#Import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
#Create our scatterplot
fig, ax=plt.subplots(figsize=(10,5))
plt.scatter(y, X)
plt.plot(lr.predict(X),X,color='k')
ax.set_title("Beef PED W/ Linear Regression")
ax.set_xlabel('Demand')
ax.set_ylabel('Price')
plt.savefig('Images/Beef_pd_wlinearregression.png', bbox_inches='tight')
;

Calculating PED

Now that we have our linear regression model and its attributes, we can solve for PED. From our coefficient (-0.047), we know the relationship between changes in price and demand in unit terms but the PED formula requires percentage change. To get a percentage change, we’ll select two price points along our PED line, predict their corresponding demand values and use those points as variables in the equation below.

#Select two price points along our PED line and put them in array format
price_point_1 = np.array(200)
price_point_2 = np.array(220)
#estimate demand at these points. Re-shape the array and call upon our first valuedemand_estimate_1 = lr.predict(price_point_1.reshape(1, -1))[0]
demand_estimate_2 = lr.predict(price_point_2.reshape(1, -1))[0]
#Place price points and demand estimates into our PED formula
PED = ((demand_estimate_2 - demand_estimate_1)/(demand_estimate_1))/((price_point_2 - price_point_1)/(price_point_1))
#Print results
print(f'PED {round(PED,2)}')
[OUT]
PED -0.46

Our PED is -0.46 meaning a 10% increase in price leads to a 4.6% decrease in demand. PED will almost always be negative because demand generally decreases when price increases and demand will increase when the price decreases. When this ratio is negative, we consider the product price elastic. In this case, we have found that beef is price elastic.

Common Mistakes

Wrap the independent variable(s) in double brackets to create a data frame. Our dependent variable needs only single brackets because Sklearn expects a single column of data as our target variable for linear regressions. Since we can have multiple columns or features in our independent variable, Sklearn expects a data frame and not a series.

#Dependant variable
y = beef_df['Quantity']
#Independant variable
X = beef_df[['Price']]
print(f' Dependant variable type is {type(y)}')
print(f' Indpendant variable type is {type(X)}')
[OUT]
Dependant variable type is <class 'pandas.core.series.Series'>
Indpendant variable type is <class 'pandas.core.frame.DataFrame'>

When making predictions using Sklearn’s .predict method given a single data point X, it’s critical to put the X value in an array format. Additionally, you have to reshape the array call upon it’s first value.

#Select two price points along our PED line and put them in array format
price_point_1 = np.array(200)
price_point_2 = np.array(220)
#estimate demand at these points. Re-shape the array and call upon our first valuedemand_estimate_1 = lr.predict(price_point_1.reshape(1, -1))[0]
demand_estimate_2 = lr.predict(price_point_2.reshape(1, -1))[0]

Conclusion

When considering to re-price a product, it’s important to understand the PED for that product so you can anticipate the changes in consumer demand in reaction to such a change. Sklearn provides easy-to-use modeling tools for building linear regression models and using those models to predict demand given new price inputs.

--

--

Alex FitzGerald
CodeX
Writer for

Founder, data scientist, and personalization expert with hands-on experience founding and growing a subscription business to a successful exit.