STEP BY STEP PURCHASE PROBABILITY ANALYSIS OF CERTAIN MARKET SEGMENTS WITH PYTHON

10 min readJun 25, 2022

In this article, will be discussed about the next phase of marketing analysis after market segmentation analysis in previous article. We will discuss about purchase probability analysis in every market segment

The data source that will be used in this article is from the minimarket transaction database. The dataset has gone through a data wragling process, so it will not be discussed again.

For those of you who have not read the previous article, please read the previous article at the link below. So you can get a better understanding of the steps of this article.

https://medium.com/@miradzji/step-by-step-customer-segmentation-analysis-with-python-82be7bf38f13

First of all, that in this article we only discuss about transaction of one product , the product is chocolate candy bars from certain mini markets.

Here are the steps:

DATASET ANALYSIS

2. IMPORT PACKAGES

In this step, we import some packages that will be used in programming process and machine learning modeling.

import numpy as npimport pandas as pdfrom sklearn.preprocessing import StandardScalerfrom sklearn.decomposition import PCAfrom sklearn.cluster import KMeansimport picklefrom sklearn.linear_model import LogisticRegressionimport matplotlib.pyplot as pltimport matplotlib.axes as axsimport seaborn as snssns.set()

3. LOAD DATASET AND MARKET SEGMENTATION MODELS

In previous article, we have made 3 files of market segmentation models. The Files are skalar.pickel, pca.pickel, dan kmeans_pca. Pickel.

Next step is to read the dataset with the code below:

df_purchase = pd.read_csv(“data/purchase data.csv”)

Access the file of machine learning models for market segmentation by using this code :

scalar = pickle.load(open(‘skalar.pickel’,’rb’))pca = pickle.load(open(‘pca.pickel’,’rb’))kmeans_pca=pickle.load(open(‘kmeans_pca.pickel’,’rb’))

4. DETERMINE THE MARKET SEGMENT FOR EACH DATA IN PRUCHASE.CSV DATASET

Some of features in purchase.csv are the same feature as the dataset we used in market sagmentation models. The features are ‘Sex’,’Marital status’, ‘Age’, ‘Education’, ‘Income’, ‘Occupation’, ‘Settlement size’.

These features stored into machine learning model by using this code below :

feature=df_purchase[[‘Sex’,’Marital status’,’Age’,’Education’,’Income’,’Occupation’,’Settlement size’]]df_purchase_seg_std = scalar.transform(feature)df_purchase_seg_pca = pca.transform(df_purchase_seg_std)purchase_seg_kmeans_pca = kmeans_pca.predict(df_purchase_seg_pca)

The Result of machine learning model processing is an array of market segment cluster. The Array data is stored into a variable named purchase_seg_kmeans_pca

5. ADDING MARKET SEGMENT CLUSTER TO PURCHASE.CSV DATASET

After determine the market segment, next step is to add the segment array to new feature to dataframe df_purchase_predictor named ‘segment’ by using the code below:

df_purchase_predictor = df_purchase.copy()df_purchase_predictor[‘segment’] = purchase_seg_kmeans_pcadf_purchase_predictor

Here’s a screeshot of the results of the codes above

6. ADDING DUMMY SEGMENT

Feature segment is a categorical feature, it cannot be compered one to another. So this feature can be canged to a couple of features accouding to how many categories are. You can use the codes below :

segment_dummies = pd.get_dummies(purchase_seg_kmeans_pca, prefix=’segment’, prefix_sep=’_’)segment_dummies

And The Result of the code above is:

According to the picture above, thare are new feature called segment_0, segment_1, segment_2 and segment_3.

The features values based on feature segment on dataframe. For example if the feature segment is 2, so feature segment_2 will be filled with 1 and the othes features filled with 0

7. COMBINE FEATURES SEGMENT_0, SEGMENT_1 , SEGMENT_2, SEGMENT_3 TO DATAFRAME.

This phase is to add new feature to dataframe feature called df_purchase_predictor. It makes machine learning modeling will be easyer

df_purchase_predictor = pd.concat([df_purchase_predictor,segment_dummies], axis=1)df_pa=df_purchase_predictor

8. MAKE PURCHASE PROBABILITY MODE WITH LOGISTIC REGRESSION

Create machine learning model to calculate purchase probability using liniear regression algoritm.

Feature Incidence would be the effect (Y) and feature price would be the factor (X).

There are 5 features for price, so the factor (X) would be the mean of 5 features.

Y = df_pa[‘Incidence’]X = pd.DataFrame()X[‘mean_price’] = (df_pa[‘Price_1’]+df_pa[‘Price_2’]+df_pa[‘Price_3’]+df_pa[‘Price_4’]+df_pa[‘Price_5’])/5model_purchase = LogisticRegression(solver=’sag’)model_purchase.fit(X,Y)

9. DISPLAY COEFFICIENT OF THE MACHINE LEARNING MODEL

To display the coefficient of the model that have been made, you can use the code below:

model_purchase.coef_

The Result is :

array([[-2.34838627]])

The negatif result of coefficient model means that the relations of X and Y is negatif relation. Where the higher the value of X, the probability of the occurrence of Y will be smaller.

10. PRICE ELASTICITIES TO PURCHASE PROBABILITY

Let’s put our attention to the price features in the dataset to determine the price simulation that will be used.

Why do we use price simulation? This is because when using the existing price, the gap between the data price is getting bigger so the elasticity graph will not optimal in displaying the price elasticity simulation.

Let’s look at descriptive statistics of prices on a dataset with the following code

df_pa[[‘Price_1’,’Price_2',’Price_3',’Price_4',’Price_5']].describe()

The Result is :

11. MAKE DATA DUMMIES FOR SIMULATION

price_range = np.arange(0.5,3.5,0.01)price_range

The codes above are to make data dummie for the prices. The minimun price is 0,5, the maximu is $3,5 and the increament is $0.01

We use a small value for increment variable (0,01) to display the elasticity movement of price clearly and details

Here’s the result:

The array variable is consist of 300 rows and could be used to make Y prediction of purchase probability. And it stored into variable called df_price_range

df_price_range = pd.DataFrame(price_range)

12. CREATE Y PROBABILITIES FROM DATASET DF_PRICE_RANGE

Y_pr = model_purchase.predict_proba(df_price_range)Y_pr

Y probability was come from purchase model that has been made. Here’s the result:

There are 2 dimensions of array which are the first columns represent 0 (Not Purchase) and the second column represent 1 (Purchase)

In this case, we only use the second column. And it stored into variable named purchase_pr, by using the code below:

purchase_pr = Y_pr[:][:,1]

13. CALCULATING PRICE ELASTICITIES

pe = model_purchase.coef_[:,0]*price_range*(1-purchase_pr)pe

The codes above are the formula for calculating the price elasticity, and the result is:

The calculation result are saved as a dataframe named df_price_elasticities with new feature nemed “Mean_PE”

df_price_elasticities = pd.DataFrame(price_range)df_price_elasticities = df_price_elasticities.rename({0:’Price point’}, axis=1)df_price_elasticities[‘Mean_PE’] = pedf_price_elasticities

This is the result:

14. ANALYZING PRICE ELASTICITY

Its like the previous step, there are 300 rows and 2 columns, but the whole data cannot be displayed. So let’s use this code below to display whole data:

pd.options.display.max_rows = Nonedf_price_elasticities

Here is the result:

15. DISPLAY PRICE ELATICITY GRAPH

plt.figure(figsize=(9,6))plt.plot(price_range,pe, color=’grey’)plt.xlabel(‘Price’)plt.ylabel(‘Elasticities’)plt.title(‘Price Elasticities of Purchase Probability’)

The codes above are displaying data as a graph. The X axis is price and the Y axis is elasticity rate.

16. ANALYZE PRICE_ELASTICITIY GRAPH

Price elasticity means how much the probability of purchase for the product in percent to the price increment.

If the value is above 100% (>1), it means that the probability is elastic, and if the value is below 100% (<1), it means the probability is inelastic.

The graph shows that elasticity is getting smaller if the price is getting higher. That means the higher of the price, the lower of the demand for the product.

The main reason of the graph is to know more detail about how much the increses of price will significantly effect the probability of purchase.

The graph picture in step 15 shows that inelasticity of prices are ranged between 0,5 and 1,1. And becomes inelastic when the increases of the price of the product is above 1,1.

It’s also shows that elasticity is negatif, because the coeficient value of the linier regression model was negatif (-2.35).

The picture above shows every $1.1 increase in price, the probability of buying will decrease by 0.69%

The price can be increased as long as the price is still within the price elasticity (Mean_PE) <1 and if the price elasticity (Mean_PE) >1, it means we have to hold the price.

17. PRICE ELASTICITY ANALYSIS FOR MARKET SEGMENT 1 (CAREER FOCUSED)

Now, Let’s analyze the price elasticity of segment 1 (Career Focused). First is to filter the dataframe only for segment 1 like the code below:

df_pa_segment_1 = df_pa[df_pa[‘segment’]==1]

18. DETERMINING FACTOR X ON EVENT Y IN MARKET SEGMENT 1 DATAFRAME (CARRER FOCUSED)

Y = df_pa_segment_1[‘Incidence’]X = pd.DataFrame()X[‘Mean_price’] = (df_pa_segment_1[‘Price_1’]+df_pa_segment_1[‘Price_2’]+df_pa_segment_1[‘Price_3’]+df_pa_segment_1[‘Price_4’]+df_pa_segment_1[‘Price_5’])/5

The codes are to determine the Y (Effect) is from Incidences feature of dataframe df_pa_segment_1 and X (Factor) is the mean of prices dataframe

19. LINIER REGRESSION MODEL OF MARKET SEGMENT 1 (CAREER FOCUSED)

model_incidence_seg_1 = LogisticRegression(solver=’sag’)model_incidence_seg_1.fit(X, Y)model_incidence_seg_1.coef_

The coefiecient result of the linier regression model is :

array([[-1.71232127]])

The coefiecient result of the linier regression model is lower than The coefiecient of a whole data (-2,35).

20. Y PROBABILITY OF MARKET SEGMENT 1

Y_pr_seg_1 = model_incidence_seg_1.predict_proba(df_price_range)purchase_pr_seg_1 = Y_pr_seg_1[:][:,1]pe_seg_1=model_incidence_seg_1.coef_[:,0]*price_range*(1-purchase_pr_seg_1)df_price_elasticities[‘pe_segment_1’]=pe_seg_1df_price_elasticities

The code is used to make Y probability of dataframe df_price_range, and to make a calculation of price elasticity and store it into feature named pe_segment_1.

21. COMPARING THE PRICE ELASTICITY OF MARKET SEGMENT 1 (CARRER FOCUSED) WITH GRAPH

plt.figure(figsize=(9,6))plt.plot(price_range,pe, color=’grey’)plt.plot(price_range,pe_seg_1, color=’green’)plt.xlabel(‘Price’)plt.ylabel(‘Elasticities’)plt.title(‘Price Elasticities of Purchase Probability’)

Here is the graph :

Its show that the elasticity price of segment 1 (career focused) is $1,39 and probability of purchase -1,01%. If the price increases above that point will make the probability of purchase decreases

22. PRICE ELASTICITY ANALYSIS FOR MARKET SEGMENT 2 (FEWER OPPORTUNITY)

df_pa_segment_2 = df_pa[df_pa[‘segment’]==2]Y = df_pa_segment_2[‘Incidence’]X = pd.DataFrame()X[‘Mean_price’] = (df_pa_segment_2[‘Price_1’]+df_pa_segment_2[‘Price_2’]+df_pa_segment_2[‘Price_3’]+df_pa_segment_2[‘Price_4’]+df_pa_segment_2[‘Price_5’])/5model_incidence_seg_2 = LogisticRegression(solver=’sag’)model_incidence_seg_2.fit(X, Y)model_incidence_seg_2.coef_

The coefiecient result of the linier regression model is:

array([[-3.64025805]])

The coefiecient result of the linier regression model is higher than The coefiecient of a whole data (-2,35).

23. Y PROBABILITY OF MARKET SEGMENT 2.

Y_pr_seg_2 = model_incidence_seg_2.predict_proba(df_price_range)purchase_pr_seg_2 = Y_pr_seg_2[:][:,1]pe_seg_2=model_incidence_seg_2.coef_[:,0]*price_range*(1-purchase_pr_seg_2)df_price_elasticities[‘pe_segment_2’]=pe_seg_2df_price_elasticities

The codes are used to make Y probability of dataframe df_price_range, and to make a calculation of price elasticity and store it into feature named pe_segment_2.

Its show that the elasticity price of segment 2 (fewer pportunity) is $1,27 and probability of purchase -1,02%. If the price increases above that point will make the probability of purchase decreases

24. COMPARING THE PRICE ELASTICITY OF MARKET SEGMENT 2 (FEWER OPPORTUNITY) WITH GRAPH

plt.figure(figsize=(9,6))plt.plot(price_range,pe, color=’grey’)plt.plot(price_range,pe_seg_1, color=’green’)plt.plot(price_range,pe_seg_2, color=’blue’)plt.xlabel(‘Price’)plt.ylabel(‘Elasticities’)plt.title(‘Price Elasticities of Purchase Probability’)

Here is the graph :

25. PRICE ELASTICITY ANALYSIS OF SEGMENT 3 (WELL OFF)

df_pa_segment_3 = df_pa[df_pa[‘segment’]==3]Y = df_pa_segment_3[‘Incidence’]X = pd.DataFrame()X[‘Mean_price’] = (df_pa_segment_3[‘Price_1’]+df_pa_segment_3[‘Price_2’]+df_pa_segment_3[‘Price_3’]+df_pa_segment_3[‘Price_4’]+df_pa_segment_3[‘Price_5’])/5model_incidence_seg_3 = LogisticRegression(solver=’sag’)model_incidence_seg_3.fit(X, Y)Y_pr_seg_3 = model_incidence_seg_3.predict_proba(df_price_range)purchase_pr_seg_3 = Y_pr_seg_3[:][:,1]pe_seg_3=model_incidence_seg_3.coef_[:,0]*price_range*(1-purchase_pr_seg_3)df_price_elasticities[‘pe_segment_3’]=pe_seg_3

26. PRICE ELASTICITY ANALYSIS OF SEGMENT 0 (STANDARD)

df_pa_segment_0 = df_pa[df_pa[‘segment’]==0]Y = df_pa_segment_0[‘Incidence’]X = pd.DataFrame()X[‘Mean_price’] = (df_pa_segment_0[‘Price_1’]+df_pa_segment_0[‘Price_2’]+df_pa_segment_0[‘Price_3’]+df_pa_segment_0[‘Price_4’]+df_pa_segment_0[‘Price_5’])/5model_incidence_seg_0 = LogisticRegression(solver=’sag’)model_incidence_seg_0.fit(X, Y)Y_pr_seg_0 = model_incidence_seg_0.predict_proba(df_price_range)purchase_pr_seg_0 = Y_pr_seg_0[:][:,1]pe_seg_0=model_incidence_seg_0.coef_[:,0]*price_range*(1-purchase_pr_seg_0)df_price_elasticities[‘pe_segment_0’]=pe_seg_0

27. COMPARING THE PRICE ELASTICITY TO WHOLE OF MARKET SEGMENT WITH GRAPH

plt.figure(figsize=(9,6))plt.plot(price_range,pe, color=’grey’)plt.plot(price_range,pe_seg_1, color=’green’)plt.plot(price_range,pe_seg_2, color=’blue’)plt.plot(price_range,pe_seg_3, color=’red’)plt.plot(price_range,pe_seg_0, color=’yellow’)plt.legend([‘Pasar Umum’, ‘Career Focused’,’Fewer Opprtunity’,’Well Off’,’Standard’])plt.xlabel(‘Price’)plt.ylabel(‘Elasticities’)plt.title(‘Price Elasticities of Purchase Probability’)

Here is the graph:

The picture above shows that the elasticity of the price in generally is almost the same, its from $0.5 to $1.1.

The Fewer Opportunity Segments graph has the highest inelastic price range from $0.5 to $1.1, this may be because this segment is the largest segment.

Or may be fewer opportunity segment realy likes the product so much, so that if the price increase in certain price range does not make the possibility of buying will decreases.

The fewer opportunity segment is the segment that is most sensitive to price increases when price increase more than $1.1 and the price elasticity to purchase probability level is below than the whole market average.

The Standard Segment has a probability elasticity is 1.01% and the price is $1.24, this makes the standard segment the second most sensitive segment if there is a price increase. However, the price elasticity to purchase probalility of this segment is still above the whole market average.

The Well Off Segment has a probability elasticity of 1.00% and the price is $1.47, this makes the well off segment is the segment with the lowest sensitivity of price increases. The effect to the probability of purchasing from this segment is not significant, this segment might be very loyal to the product.

Dataset : https://github.com/miradzji/customer_segmentation_da

That All For This Time, Thanks

STEP BY STEP PURCHASE PROBABILITY ANALYSIS OF CERTAIN MARKET SEGMENTS WITH PYTHON

Written by Rahmat Taufiq Sigit