Calculating Potential Customer Revenue Using Rule Based Classification

Barış Cengiz
3 min readMar 13, 2022

--

In this project the main goal is to calculate potential customer revenue without using any machine learning libraries or pre-built functions/methods.

The Data

First, let’s examine the data in the Persona dataset. This dataset contains customer data from a variety of countries.

Image-1 Persona DataFrame Info

As it can be seen in Image-1, Persona Dataset has 5 columns with 5000 entries. Clearly this is a clean dataset with no missing values. Price and age are the numerical features. Source, sex and country are the categorical features. All features are self explanatory except SOURCE. Source feature is the information of customers’ smart-phone operating system.

Image-2 df.head()

Source feature has 2 unique values: ios and android as it is shown in Image-3

Image-3 Unique Values of Source Feature

Investigate the data further:

Image-4 Average Customer Spending by Country

Customers are from 6 different countries. Average spending by country can be seen in Image-4.

Image-5 Average Customer Spending by Source
Image-6 Average Customer Spending by Country-Source

Pre-Processing the Data

To make the data a bit more clear, aggregate the data as shown below.

Image-7 Agg.
Image-8 agg_df First Five Entries

Now one way to continue is adding age categories to the dataset. This will help us predict a new customers revenue. Age intervals can be defined as followed:

0–18: Underaged,

19–23: College students (Possibly),

24–30: Young adults,

31–40: Possibly young parents

40–66: 66 being the oldest age in the dataset.

Image-9 First 5 entries with the age_cat feature

Now it’s time to add the persona feature to the dataset.

Ex. Persona: “BRA_ANDROID_MALE_41_66”

Image-10 First 5 entries of the persona_df

Essentially we have grouped customers by country, source, gender and age interval then calculated average spending for each group.

Next step is creating segments for average spendings. Let’s divide the average spending into 4 segments: A,B,C,D (Descending)

Mean, max and sum values for each segment

Our dataset is ready to use for predicting revenues of new customers.

Prediction

Function to predict the revenue of a new customer

pred_price function takes DataFrame, country, source, sex and age inputs to return predicted segment and price for a new customer.

Examples:

So we can expect a potential 41.8 revenue value from a 33 year-old Turkish female android user and a potential of 32.8 revenue value from a 35 year-old French ios user.

Thanks for reading.

Barış Cengiz

--

--

Barış Cengiz

Avionics System engineer learning and sharing about data science, machine learning and artificial intelligence.