Starbucks Customer Analysis and Best Offers

Starbucks Data Science Challenge

Adham ElKhouly

Published in

Analytics Vidhya

7 min readAug 19, 2019

Introduction

Today, I will be taking a look at some data sets provided by Starbucks to Udacity for the Data Science program.

I thought it would be interesting to look at, and attempt to understand the demographic of customers who are enrolled in Starbuck’s reward program. As well as build a predictive model on which offers would be successful or effective.

An effective offer is one that VIEWED and COMPLETED by the customer.

The data cleaning portions will not be explained in this post, but they can be found in the Github repository.

I. Data Sets:

The data is contained in three files:

portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
profile.json — demographic data for each customer
transcript.json — records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

portfolio.json

id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings)

profile.json

age (int) — age of the customer
became_member_on (int) — date when customer created an app account
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
id (str) — customer id
income (float) — customer’s income

transcript.json

event (str) — record description (ie transaction, offer received, offer viewed, offer completed)
person (str) — customer id
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record

II. Exploratory Data Analysis

1. Offers Sent out

Offers sent out are of 3 types: BOGO (buy one, get one), informational, and discount.

We can see that discount offers are the least offered by Starbucks at 20%. While the other 2 come equally at 40%. Informational offers do not require an action from the user side, while the other 2 do.

2. Age, Income and Gender Distribution of Customers

There were some very interesting findings here that I will attempt to wrangle and explain.

Simplest one being the gender distribution, where most rewards customers tend to be male at about 58%, females coming second at about 40%, and other genders at 2%.

The next 2 plots will be analyzed simultaneously.

Interestingly, most Starbucks rewards customers lie in the 50–59 age group (I personally thought it would be in the 20–39 range!). However, if we look at the plot above that, we can see the income levels also increasing with each age group.

Therefore, the explanation here can be actually that as income increases (tends to increase with age), customers enroll in the rewards program.

3. Distribution of Offers Received by Customers

There are a total of 10 offers sent out by Starbucks, and it can be seen that they are equally distributed.

This is especially crucial later, so that our predictive model does not skew towards any certain offer, just because it is sent out more.

4. Completed and Viewed Offers

Offer events in the data set are of three types as explained above. Now we will look at the relation between viewing/clicking on an offer and completing .

We can notice that are highly correlated. Highly completed offers, are those that are also highly viewed.

Offers 2 and 7 are the exceptions, since they are informational offers, and cannot technically be completed, just viewed.

III. Linear Regression Model

I started by building a preliminary model, in an attempt to look at certain coefficient weights, and indicators that a customer would spend more.

Let us focus on general features rather than specific offer ones. The three strongest positive indicators that a customer would spend more are (in descending order):

Number of discount offers received
Female Gender
Informational Offers Received

On the other hand, the three strongest negative indicators that a user would spend more are:

Male Gender
Year
Total Offers Received

Interestingly, and to summarize:

Female customers are willing to spend more than male customers on Starbucks products.
Discount offers are strongly linked to customer expenditure.
Members enrolled recently are less likely to spend more.
If a customer received a bigger amount of offers, they are more likely yo spend less money on products.

IV. Classification Predictive Models

As mentioned before, the goal of these predictive models is to predict if an offer would be effective when sent out to a certain customer based on their demographic.

As mentioned in the introduction, an offer is assumed to be effective when it is, both, viewed and completed by a customer.

1. Naive Classifier

I started by building a Naive Classifier, which assumes that all offers are effective, and ended up with the following metrics:

Precision: 0.4216387505658669
Recall: 1.0
f1-score: 0.5931728442236658
Misclassfication Rate: 0.578361249434133

The misclassifcation rate is about 58%, meaning it is wrong ~60% of the time. And as expected, the f1-score is also ~60%, very close to flipping a coin.

The Naive Classifier served as a base case to compare performance against.

2. Logistic Regression Classifier

Next, I thought I should move on to a linear classifier, knowing that there are some non-linearity in the data.

The following was the result:

Best parameters for model are:
 {'penalty': 'l1', 'C': 1.0}
              precision    recall  f1-score   support

           0       0.79      0.77      0.78      6322
           1       0.70      0.73      0.72      4723

   micro avg       0.75      0.75      0.75     11045
   macro avg       0.75      0.75      0.75     11045
weighted avg       0.75      0.75      0.75     11045

True Positives: 3456
True Negatives:4851
False Positives: 1471
False Negatives: 1267
Classification Error:  0.2478949751018561
Sensitivity:  0.7317383019267415

Even though the model is a linear one, it still had a pretty low classification error at ~25%. And the f1-sore was 0.75.

This mean that a model that can handle non-linear decision boundaries would perform better, and that was the next step.

3. Random Forest Classifier

Next, I moved on the test an RFC, and as expected, it did provide much better results compared to the linear model.

Best parameters for model are:
 {'n_estimators': 200, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_features': 'auto', 'max_depth': 8}
              precision    recall  f1-score   support

           0       0.79      0.80      0.79      6307
           1       0.73      0.71      0.72      4738

   micro avg       0.76      0.76      0.76     11045
   macro avg       0.76      0.76      0.76     11045
weighted avg       0.76      0.76      0.76     11045

True Positives: 3382
True Negatives:5047
False Positives: 1260
False Negatives: 1356
Classification Error:  0.23684925305568127
Sensitivity:  0.7138032925284931

The classification error was reduced to 23.7%, and the f1-score when up to 0.76.

However, all this was done RandomizedSearchCV, which picks a random combination of parameters and tests them, rather than an exhaustive test.

The following results were obtained with a grid search (GridSearchCV), which is an exhaustive search, and again, the model was improved once again.

Best parameters for model are:
 {'max_depth': 9, 'max_features': 'sqrt', 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 150}
              precision    recall  f1-score   support

           0       0.79      0.81      0.80      6304
           1       0.74      0.71      0.72      4741

   micro avg       0.77      0.77      0.77     11045
   macro avg       0.76      0.76      0.76     11045
weighted avg       0.77      0.77      0.77     11045

True Positives: 3363
True Negatives:5119
FalsePositives: 1185
False Negatives: 1378
Classification Error:  0.232050701674966
Sensitivity:  0.7093440202488926

The classification error was further reduced to 23.2% and the f1-score was raised to 0.77.

The next figure shows which features tend to be important when classifying whether an offer will be effective or not:

V. Conclusion

The most difficult challenge was actually the data cleaning and combination of offer , transaction , and customer information. Once all of those features were represented in a single DataFrame, the following analysis was easier.

After putting all the data together, we moved on to creating a base case to compare with (Naive Classifier). And then we used a linear model, and a non-linear model.

We then looked at the feature weights in an attempt to understand which offers tend to do better.

The take away from this analysis:

Female Customers tend to spend more than the rest of the customers.
Discount offers lead to more customer expenditure
The most important features of an offer were the following:
a. Difficulty
b. Duration
c. Reward
That suggested that an offer tends to be effective due to its own attributes, rather than being linked to a certain customer demographic. However, age and income also came in 6th and 7th place.
Surprisingly, informational offers tend to be most effective.
Social media, as a channel, tends to be more successful than other channels. It is worth remembering that all offers are sent by email, therefore, it is tough to understand its actual effectiveness.

For a more technical and in-depth analysis, as well as to see all the data wrangling steps, please visit the Github repository.