Which demographic groups respond best to Starbucks Offers?
A coffee a day keeps a grumpy away. My favorite coffee drink is late and my go to coffee place is Starbucks, where I can work, listen to cool music, have a nice conversation, and enjoy a cup of coffee. When in a hurry, I use Starbucks mobile app to order my drink and also get rewards.
In this article, we will analyze a data set that contains simulated data mimicking customer behavior on the Starbucks rewards mobile app to determine which demographic groups respond best to which offer type.
1 Project Definition
Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. Moreover, not all users receive the same offer. Every offer has a validity period before the offer expires. For the sake of simplicity, this dataset has only one product.
In order to solve the question of interest:
Question: Which demographic groups respond best to which offer type?
We will use the given transactional data showing user purchases made on the app that includes the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user views the offer. There are also records for when a user completes an offer.
However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the “buy 10 dollars get 2 dollars off offer”, but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.
1.1 Problem Statement
The purpose of this project to build a Machine Learning (ML) model that determines which demographic group responds best to which offer type.
This goal can be achieved by following the below-mentioned strategies:
· Exploring and Visualizing the Data.
· Pre-processing the data.
· Applying Quick Data Analysis on the cleaned pre-processed data
· Scaling the numerical features.
· Trying several Supervised Learning Models.
· Evaluating the models using the chosen metric (Accuracy) and then Choosing the best Supervised Learning Model among them.
· If the results need to be improved, implementing GridSearchCV to find the best parameters (to improve the performance of the chosen model).
1.2 Metrics
We found that our training data is nearly balanced in terms of the distribution of target class, performance metrics like precision, recall, and f1_score are perfect measures for evaluating a model. The F1-score metric is “the harmonic mean of the precision and recall metrics” and is a better way of providing greater predictive power on the problem and how good the predictive model is making predictions.
2 Analysis
2.1 Data Exploration
The data is contained in three files:
· portfolio.json — containing offer ids and meta data about each offer.
· profile.json — demographic data for each customer.
· transcript.json — records for transactions, offers received, offers viewed, and offers completed.
In the following sub-sections, we will go over these files to understand the data and how to utilize to answer our question.
2.1.1 Portfolio
Portfolio dataset contains the following meta data about each offer:
· offer_type (string) — type of offer ie BOGO, discount, informational.
· difficulty (int) — minimum required spend to complete an offer.
· reward (int) — reward given for completing an offer.
· duration (int) — time for offer to be open, in days.
The snapshot below shows the head of this dataset.
As we can see, there are more columns included in the dataset such as channels and offer_id. The portfolio dataset has 10 rows and 6 columns, which means there are 10 offers.
To make sure there are 10 unique offers, we run the following command.
Even though there are 10 unique offers, there exist 3 offer types: BOGO, discount, and informational.
2.1.2 Profile
The profile dataset includes the following demographic data for each customer:
· age (int) — age of the customer.
· gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F.)
· income (float) — customer’s income.
The dataset head shows 2 additional columns: customer_id and became_member_on in addition to the columns listed above. It also shows that some customers are 118 years old, which is not realistic. The customers who are 118 years old do not have gender or income values, which might be an indicator that 118 is a replacement of NULL values.
Let’s take a look at the number of missing values in the profile dataset.
Both gender and income columns have 2175 missing values. Now, let’s take a look at how many members are 118 years old.
There are 2175 members with the age value of 118 which matches the number of missing gender income values, which indicates that we can drop these rows as they do not include any useful information for better model implementation. Before deleting the rows, let’s take a look at these rows where the age equals 118.
We find out that both gender and income values equal to NaN. Now, we can proceed by deleting the rows.
2.1.3 Transcript
Transcript dataset includes records for transactions, offers received, offers viewed, and offers completed. The most important columns to facilitate investigating our main question are:
· event (str) — record description (i.e., transaction, offer received, offer viewed, etc.)
· person (str) — customer id.
· time (int) — time in hours since start of test. The data begins at time t=0.
· value — (dict of strings) — either an offer id or transaction amount depending on the record.
Possible event values are offer completed, offer received, offer viewed, and transaction as shown blow.
The percentage of event distribution is demonstrated below.
2.2 Data Visualization
When attempting to visualize the income distribution in the profile dataset, we found out that the income histogram plot shows that most users have income between $50K and $75K.
The gender distribution chart demonstrates that Starbucks has more male customers than female customers. The difference is more than 2K customers.
The age distribution chart shows clearly that the majority age group is 50–60.
When describing this dataset statistically, we find out that the mean age is 54 years old.
Moving on to visualizing the event distribution in the transcript dataset, we find out that most events fall into the ‘transaction’ type followed by ‘offer_received’.
3 Methodology
3.1 Data Pre-processing
In the transcript dataset, we find out that the value column includes data about offer id, amount, and reward. Therefore, we extract these values for better analysis.
Then, we clean up the duplicates in offer id and offer_id by merging them into one column called offer_id.
Next, we merge portfolio and transcript datasets to find out the number and type of offers received, viewed, or completed with a transaction.
Then, we extract the completed transactions after an offer has been received and viewed.
Since the different offers have different features and different sequences of completion, e.g., the are no rewards for informational offers, we split the transcript data by offer type for easier analysis.
Within each offer type, the responded_offer flag is used to filter out the offers which were successfully viewed and completed by users. For BOGO and discount offers, the responsed_offer should be the one that with ‘offer complete’ events, and for the informational offers, ‘transaction’ can be seen as a successful offer.
Next, we split the customers who only viewed the offers without making a transaction and the customers who only received the offer without viewing it.
After separating the different cases of customers, the following steps will first focus on customers who finish the transaction after receiving the offer and customers who only view the offer without making any transaction.
As for the informational offers, the offer could only be considered as responded under the effect of the offer when the transaction is finished within the duration of the offer.
3.1.1 Feature Engineering
After basic data processing, will check if there are any columns that can be used to create new features.
First, we generate a new column for the length of customer’s membership using became_member_on column in profile dataset.
Second, we calculate the number of offers received by each user.
Third, we subtract the transactions that are not related to received offers.
Fourth, we calculate the time lap between offers received.
Finally, we merge the temporary data created above together, then drop the missing values in the gender column, and split the channel column to multiple categorical variables.
3.2 Implementation
In order to figure out which factors affect the customer’s decision to respond to an offer, we build a Machine Learning (ML) model to predict whether the customer will respond to different types of offers or not.
We use the ‘offer_responded’ flag in the dataset to build a model that predicts if the customer will respond to a certain offer of not. For this case, we choose the basic tree model as a baseline which can help explain the feature importance better, so that we can get some insight into what factors affect the customer’s behavior most.
3.3 Model Implementation Preparation
First, we set the target column to offer_responded and the feature variables to all columns in the bogo_offer dataset other than [‘person’,’offer_id’,’offer_responded’,’offer_type’].
Second, we split the data into training and test sets.
Finally, we create a function to execute the model for different offer types.
3.4 Initialize the Model Baseline
We are going to use the default parameters for the baseline model. Then, we will tune the parameters in the next steps if needed.
3.4.1 BOGO Model
As shown below, the accuracy of both models is good for an initial model implementation. But the F1 score is lower than 80% which may be tuned better in the next steps. Although Decision Tree Classifier’s precision and F1 scores are a little higher than Random Forest Classifier, there is no big problem to send out some offers to people who are not going to respond in the end. Therefore, we can still select the Random Forest Classifier as it has slightly better accuracy.
3.4.2 Discount Offer Model
As shown below, the Random Forest Classifier performs slightly better than the Decision Tree Classifier.
3.4.3 Informational Offer Model
Similarly for the informational offer model, the Random Forest Classifier performs slightly better than the Decision Tree Classifier.
3.5 Refinement
This section attempts to tune the parameters of the initial model to result in better performance. In the tuning section, we start by using GridSearch to search for parameters that are likely to yield better model performance.
Then, we use optimized parameters to rerun the model in the previous steps.
Lastly, we compare the results with the previous initial model’s results to find out that after using tune parameters, the test accuracy slightly improved from 0.833 to 0.838 and the F1 score increased from 0.759 to 0.779.
We repeat the same steps for discount offer data to find out that after using tune parameters, the test accuracy slightly improved from 0.872 to 0.873 and the F1 score increased from 0.814 to 0.816.
Finally, we repeat the same steps for the informational offer. The comparison shows that after using tune parameters, the test accuracy slightly improved from 0.748 to 0.753 and the F1 score increased from 0.681 to 0.678.
4 Results
4.1 Model Evaluation and Validation
In this section, we will investigate the results to figure out which factors affects whether customers will respond to offers or not.
By looking at the charts below, we find out that for all three types of offer, the most important factor that largely affects whether the offer will be responded to by the user is the length of membership. That is, the longer the customer is a member of Starbucks, the more likely (s)he will respond to the offer they receive. Then the second and third important factors affecting the customer’s response are age and income. Lastly, the number of offers they received will also affect the response a lot.
5 Conclusion
5.1 Reflection
This project is trying to answer the following questions:
· What factors mainly affect the usage of the offer from the customer? Should the company send out the offer or not?
· How possible will a customer open and use the offer sent to them? Are there any common characteristics of the customers who take the offer?
From the result of the project, it is likely to use ML model to predict whether the customer will respond to the offer or not, and the model also shows the main factors such as the length of membership, age, income which highly affect the possibility of customer’s responding to the offer.
5.2 Improvement
Future work might involve doing some more experiment on feature engineering step to see if any other new features can improve the model and trying to reduce some feature to see how it will affect the model performance.
Moreover, this analysis is focused more on customer’s who successfully finish the transaction after they received an offer, there should be more insight for the other cases where the customer finishes the transactions regardless of the offer.
Finally, unsupervised learning algorithms can be implemented to cluster the customers based on the given information, to find the specific characteristics in each group of customers who will be more likely to respond to a specific offer.
To check out the code of this analysis, visit my GitHub repo.
If you enjoyed reading this article, please recommend and share it to help others find it!