A Data Scientist View of Starbucks Promotion Offer Data

8 min readApr 16, 2020

Introduction

Starbucks is one of biggest coffee company and coffeehouse chain around the world. Since the development of internet and smart phone, people’s lives are more and more involved with those technologies. Starbucks also took advantage by sending promotions through those technologies. There was over 10% of in-store purchases made on mobile devices using the Starbucks APP by July 2013, and the number keeps growing. (Source) Once every few days, Starbucks send out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. Based on those original data, this project’s data were simulated for customer behavior on the Starbucks rewards mobile app. The goal of this project is to help Starbucks to better understand its customers by the following two questions:

1. Which demographic groups respond best to which offer type? (Statistical Application)

2. What are the top 5 features that influence those offer reactions? (Machine learning Application)

EDA (Exploratory data analysis)

There are three data set in Json format, which includes portfolio.json (containing offer ids and meta data about each offer (duration, type, etc.)), profile.json (demographic data for each customer) and transcript.json (records for transactions, offers received, offers viewed, and offers completed). Those were copied into offer, customer and transaction data frame for better operation.

Here is the schema and explanation of each data frame:

Offer data set with 10 records:

• id (string) — offer id

• offer_type (string) — type of offer ie BOGO, discount, informational

• difficulty (int) — minimum required spend to complete an offer

• reward (int) — reward given for completing an offer

• duration (int) — time for offer to be open, in days

• channels (list of strings)

Customer data set with 17000 records:

• age (int) — age of the customer

• became_member_on (int) — date when customer created an app account

• gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)

• id (str) — customer id

• income (float) — customer’s income

As the Fig 1 shows, there are slightly more male customers than female customers in customer gender distribution.

Fig 2. Customer Registering Distribution

In Fig 2, the customer registering distribution is not stable through out years. It met a first jump of numbers around 2016. And it kept flat until the middle of 2017. Then it received the second jump of numbers. However, when the time came to 2018 the numbers of registering came to a marginal decrements. Overall, the numbers of customer registering were in a trend of increasing from 2013 to 2019.

Transaction data set with 306534 records:

• event (str) — record description (ie transaction, offer received, offer viewed, etc.)

• person (str) — customer id

• time (int) — time in hours since start of test. The data begins at time t=0

• value — (dict of strings) — either an offer id or transaction amount depending on the record

Part I: Which demographic groups respond best to which offer type? (Statistical Application)

There are three different aspects for solving this question, which are demographic groups, responds and offer types. Those are also related to customer, offer and transaction data frames. So, it is necessary to merge those three data sets together. But, before data merging, it is also important to do some data cleaning. For offer data set, the channels column’s values were released as new columns. (As shown in Fig 3) Then, one hot encoding was applied on offer_type column. For customer data set, the null values in gender column were replaced with ‘NA’. And the null values in income were also filled with mean value. For transaction data set, the dictionary strings in value column were expend as new columns. (As shown in Fig 4) After other regular data manipulations, the final merged data set was successfully achieved. (As shown in Fig 5)

Fig 4. Expending Dictionary Strings

Since the responds and group’s demographic are related to the offer type. Additionally, the offer type does not change through entire receiving, viewing and completing processes. We should focus on individual offer type researches, which are researches on BOGO, discount and informational offers. In other words, each offer type would have its own responds and group’s demographic. Due to informational offer can only be viewed, there is no research on its offer completing. The results are shown in blown (Table 1).

For BOGO offer, the customers who completed the offer is younger and have more income with even gender distribution than the customers who viewed offer. For discount offer, the customers who completed the offer is younger and have more income than the customers who viewed offer. Overall, the offer viewed rate is higher than offer completed rate for both BOGO offer and discount offer.

Overall, in Fig 6, the number of viewed BOGO and discount offer is much more than the number of viewed informational offer, which is almost twice more for both BOGO and discount viewed offer. However, Fig 7 shows the difference between the number of completed BOGO and discount offer is small, which is around 2000 offers.

As a result, the different offer type does have different responds of different groups’ demographic. For example, although the number of completed BOGO and discount offer are close to each other, the offer completed rate and gender distribution are very different. And more detailed responds and different groups’ demographic for different offer type are in Table 1.

Part II: What are the top 5 features that influence those offer reactions? (Machine learning Application)

Model Selection

All machine learning modeling could fall into two categories, unsupervised learning and supervised learning. Since the data for this project have all been labeled, we will use supervised learning for now. Then, random forests model was selected, due to one of its biggest advantage that is to provide a reliable feature importance estimate. (Pros and cons of random forests) This advantage is a perfect fit for this question.

— Algorithms for Random Forest

The algorithms behind the random forest is a combination of decision trees. In the ‘forest’, each decision tree depends on the values of a random vector sampled independently with the same distribution for all trees. (Top 5 Predictive Analytics Models and Algorithms)

Model Building

In order to achieve best performance of modeling, the data sets were inner merged to reduce null value. (Shown in Fig 8)

Fig 8. Data Merging Process

After dropping useless columns, the random forests model was targeted to offer viewed and offer completed. Then the data set went through the process of train test split. Next, the models were fitted with train data.

Model Improvement by Hyper Parameter with Cross-validation

By implement grid search, two models were able to be optimized with the best hyper parameters. The initial setting of hyper parameters is in Table 2.

After grid search with cross-validation, the best combination of hyper parameters for the offer viewed model is shown in Fig 9. Same process and hyper parameters initial setting was also applied to the offer completed model. And the best combination of hyper parameters is shown in Fig 10.

Fig 9. Best Hyper Parameters Combination for Offer Viewed Model

Fig 10. Best Hyper Parameters Combination for Offer Completed Model

Metric

The Metrics used to measure the model performance is the coefficient of determination R². As well as the classification report for each model. Those metrics are shown in Table 3 with before and after model improvement conditions. By comparing the before and after, the improvement is significant.

Machine Learning Results

As mentioned in previous question, there are two type offer reactions: viewed and completed. In order to find top 5 influence features for both offer reactions, two random forest machine learning models were built dedicated to find those features. The final results are showed in Fig 11 and Fig 12. Top 5 influence features for offer viewed reaction are ‘time’, ‘age’, ‘income’, ‘social’ and ‘difficulty’. Top 5 influence features for offer completed reaction are ‘time’, ‘age’, ‘income’, ‘duration’ and ‘informational’.

Fig 11. Top 5 Influence Features for Offer Viewed Reaction

Fig 12. Top 5 Influence Features for Offer Completed Reaction

Although those influence features were selected by machine learning models on different offer reaction, it is amazing to see first three influence features for each reaction is exactly same. So, it is definitely worth to dig little deeper on those three features. They are ‘time’, ‘age’ and ‘income’, as showed in Table 4 with the values of mean and median.

Conclusion

— Answers to the two initial questions

· For different offer type, the customers with different demographic do response differently.

· The customers’ demographic changed dramatically from offer viewed to offer completed for both BOGO and discount offers.

· ‘time’, ‘age’ and ‘income’ are most important features that influence customers’ offer reactions.

· Starbucks could definitely make a better future decision based on the result of this project, if they want to improve the offer reactions for different customers. For example, if they want to improve BOGO offer completed reactions, they should send out more offers that at least available for 410 hours and focused on the customers aged around 58 years old with income around $ 70,000.00 regardless their gender.

— Difficult

The original data are still limited to the null values, which pushed me to inner merge the data sets for machine learning part. If there are much fewer null values, I could keep using the merged data from the first question, which contains much more records. This could lead to a better machine learning application.

— Future improvement

This project is mainly research on the customers who viewed and completed offers. The future improvement could be focusing on the customers who did not react to those offers. Maybe, there is a pattern in those specific customers, which would provide a bigger picture for this project.

The detailed GitHub repo code for this analysis can be found here.