Starbucks coffee discounts
The project provides data in order to analyze the effects of different kinds of offers, like discounts or adverts, on the behaviour of consumers. The main question is, what person should be provided with which offers to maxime their spendings.
The main goal is that each customer is provided with an ideal set of offers, in order to maxime their spendings. For that, different approaches can be taken. I had a deeper look into two different ways.
The first way I am going to look at the problem is using a neural network. I am going to feed it with the personal data provided (age, income and gender) as well as the personal history of each individual. Since the duration of each offer differs, and there are timeframes where no offer is active, the outcome I am trying to predict is the average amount of money spent per hour, given the current status of a person (which offers is that person provided with).
The second approach will pe a purely statistical model. I am going to devide the data into several age and income bins and then have a look at the spendings for each group depending on the offer type they are provided with.
The main metric to measure the neural network is the mean squared error. Since this is a regression model, it is the standard loss to take into account.
To rate the model, I will take the R2_Score, which is 1 for a perfect model and can also be negative for a very bad model. A constant model would get a scor of 0.
Data Exploration and Visualization
Starbucks released a dataset, which could give an impression of that. It contains the data of 17,000 customers, 10 different kinds of offers and over 300,000 actions.
The personal data given is age, gender and income. 2175 of these people in the dataset did not provide any personal data. The distribution of the other 14825 is shown in this graphic.
There are ~8500 males, 6100 females and 200 with “other ”as gender. The general distribution looks like expected, with older people having in average a higher income than younger once.
The offers in the dataset can be separated into three groups: informational, discounts and bogo (buy one get one free). There are 2 informational, 4 discounts and 4 bogos. Each offer as a duration it is valid, a difficulty and a reward (for the informational the reward is 0).
There are also four times of events recognized. “Offer received”, “Offer viewed”, “Offer completed” and “Transaction”. The goal is to find out how much an individual will spend depending on the offers it received and viewed. The distribution of offers can be seen below:
The number of times each offer was received does not differ much, which is a good starting point.
What is quite interesting is that the view rates differ a lot. From ~35% for offer 4 up to >95% for offers 1, 5, 6, 8. Since offer 4 has the highest difficulty this raises the question, whether there is some sort of preview of the offers that is not recognized in the data. Unfortunately I cannot answer that.
Looking at the completion of offers, there is quite a big difference as well (informationals cannot be completed). It looks like, that the main influence on total completion is the difficulty, with the exception of offer 6 who has many completions although it is rather difficult. In addition the offers with a lower reward seem to have a lot more completions than the once with higher rewards, which is quite interesting.
My first approach to analyzing the data was to train a model, which takes the personal data like income, age and gender into account, but also the personal history of each individual.
I split the data in a way, that each new status a person has (new offer viewed, offer expired or offer completed) will serve as a new datapoint. Therefore, over time for each person there will be a step-by-step history.
I split the income and age into seven bins each. After that, I normalized the data between 0 and 1 for each column. All columns refering to time, all columns refering to money and all columns refering to numbers of viewing and offer, having an open offer or completing an offer were normalized together by the same values, in order to keep them consistant.
The data was split into training and testing data with 25% of the users serving as testing group.
Using this data I trained a Neural Network and tried to predict, how much money/h a certain person would spend, considering their current status (which offers are viewed and active), personal details (age, gender, income) and personal history (how much was already spent during which offer).
The neural network is implemented using a scikit-learn MLPRegressor. The input vectors size is 69 (7 income bins, 7 age bins, 3 gender bins, number of times an offer was viewed, number of times an offer was completed timeperiod an offer was active, money spent while an offer was active, how many times the offer is open (all of them for each offer), the time spend while no offer is active and the money spent while no offer is active).
The output is the average amount of money spent per hour given the current status.
In order to get good results I did a gridsearch with three parameters to search:
learning rates: 0.01, 0.005, 0.001,
layers: (100, 100), (50), (50, 50), (50, 50, 50), (25)
activation: tanh, relu
The statistical model only takes those persons into account that provided personal data. It is devided into the three genders, 7 income and 7 age groups, being:
age: < 30, 30–41, 42–53, 54–65, 66–77, 78–89, >89
income: <43k, 43k-55k, 56k-68k, 69k-81k, 82k-94k, 95k-107k, >107k
Due to certain correlations, especially between age and income, not all combinations are having the same amount of data, as the plot below shows:
Of course, the more data there is available, the more accurate the model can be assumed to be for each respective group.
Results and Refinement of Neural Network
Since the first model of the neural did not work out well, having an R2_score of 0.025, I tried to refine it.
In a first step I removes the outliers. This resulted in a slightly worse model, having an R2_score of 0.022.
In a third step I then tried to remove the people who did not give any personal background and also focused on the datapoints where already some personal history is given.
The downside of this was that the dataset available reduced to 1/4 of its original size, resulting a little better model with an R2_score of 0.071, but nevertheless not good at all. But it looks like, that with a growing personal history the model could get better over time.
Results of statistical model
After the neural network did not give the results I was hoping for, I looked at it from a different angle and made a statistical model. For that, I removed the users with no personal data and split the rest into groups. 7 different ages, 7 different income groups and 3 different genders.
I also split the offers into the three groups; bogo informational and discount.
Then I could have a look at how much each group spent on each type of offer per time, and how much they spend if they don’t receive an offer.
The downside however is, that for some combinations there are many different users available as reference, while for others there are little to none, especially for the other gender. The charts below show the distributions of points, and hence are also a measure for how well the model will perform for each group.
It can be seen, that the distribution varys a lot between groups. Especially for high incomes and young age there is barely any data available. Also for the oldest group, the data is very sparse. This of course does have a negative impact for the prediction quality of this model in these areas.
Some examples for certain groups are shown below:
While older females with a high income spend about 3 times as much when they receive a discount, very old females with a medium income don’t seem to be affected at all by any offer.
Males with a rather high income and aged between 52 and 65 seem mostly influenced by informationals, but in general by all kinds of offers, whereas males with the same income between 77 and 89 seem to rather reject offers and spend even less if they receive offers, no matter of what kind.
To tackle this problem, a statistical model seems to be much better than a neural network. Maybe the dataset was to short and small, and having a lot more persons and a lot longer timeframe (and therefore much more personal history) will improve the predictions, but that would have to be evaluated. For the given dataset the statistical model served best, although it does have some weaknesses.
One weakness is that the data available differs a lot between groups. Especially for people who feel neither female nor male.
The second shortcoming is that it only works, if personal data is available.
If personal data is available it does provide a good starting point. If not probably, the offers which were completed most often would provide a good starting point.
Maybe there is a better way to include personal history and make predictions more personal than just on being in a certain group. This could then be combined with the statistical model, to get even better results. Also a neural network might work, if the timespan and group size get bigger.
Maybe there could also be found a way to seperate the users into groups not primarily by personal data, but rather by spendings at different offer types. Then this could be used for future predictions. A downside of that approach would be the cold start problem, but there the statistical model could come in. Then with a growing personal history the weights are shifted from the statistical model to the one that categorizes by personal history.