Starbucks rewards experiment: using data to determine which kind of offers excite people

11 min readDec 1, 2022

Project Overview

This project forms part of the final project in Udacity’s data science Nanodegree course. Starbucks has provided some simulated data from a series of offers they are promoting to their customers on their rewards mobile app over a period of 30 days. They have 3 types of offers:

Buy-one-get-one (BOGO) — a user needs to spend a certain amount to get a reward equal to that threshold amount.
In a discount, a user gains a reward equal to a fraction of the amount spent.
In an informational offer, there is no reward, but neither is there a requisite amount that the user is expected to spend.

Not all customers receive the same offer. Certain customers may not be influenced by a certain offer they receive. Some users may not even look at the offers they receive, but will spend money in the offer’s validity period regardless.

Problem Statement

The aim of this project is to identify which groups of people are most responsive to each type of offer, and how best to present each type of offer. This will be done by breaking up the problem into 3 key areas:

Which customers?

The customers will need to be segmented based on a combination of their features and behaviours. The popular unsupervised machine learning clustering model, KMeans, will be used for this.

2. Which offers?

We need to determine which offers were the most influential i.e. out of all the offers sent, how many actually drove the customers to make a purchase? And how much money did each offer convince the customers to spend?

3. Which modes?

Which forms of advertising or presenting the offers work best? Where are customers most likely to see the offer as soon as it is posted? How accessible is the offer? How well does the offer provide a call to action? How easy is it to go out and complete the offer once a customer has viewed it?

Metrics

In order to answer the questions posed above, the following metrics will be used:

A silhouette score will be used to evaluate the quality of clusters resulting from the segmentation algorithm i.e. how dense and well-separated the clusters are.
A response rate will need to be calculated for each offer, and each customer segment to determine its success:

Response rate = number of offers completed / number of offers sent

This metric is not applicable to the informational offers, therefore another metric will be used:

Influenced spend will indicate the value of money that the offer influenced the customer to spend during its validity period:

Influenced spend = Spend in validity period / number of offers sent

Time to view can be used to determine which platforms people are using the most and where they are most likely to see the offer as soon as it is posted:

Time to view = offer sent — offer viewed

Time to complete will be calculated to see how quickly the various offers urge the customer to go out and make purchases

Time to complete = offer completed — offer viewed

Data Exploration and Visualisation

The simulation was run over a period of ~30 days. The 3 datasets have been cleaned and merged in order to provide some high level insights:

Offers

Figure 2A — Ten unique offers were sent out to customers over 30 days. Relatively the same volume of each offer was sent. A customer is therefore equally likely to receive any of the 10 offers.
Figure 2B — Each of the 10 offers forms one of the 3 offer types. The total number of bogos and discounts sent out are same, while informational offers make up about only half of those amounts. A customer is therefore less likely to receive an informational offer than a bogo or discount.

Customers

Figure 2C — Starbucks has a customer base with a normal age distribution. Customer ages range from 18 to 101, and have an average of about 54 years. It is interesting to see that there is portion of older people using the mobile app, as generally these technological marketing schemes are aimed at younger people.
Figure 2D — Starbucks customers earn on average 65,400 with a slightly right skewed distribution
Figure 2E — The majority of the customers have only joined the rewards mobile app in the last year, but there is a small group of loyal customers who have been a member for over 3 years.
Figure 2F — Males make up most of the customer base, and there is a small group of people who are identify as ‘other’.

Customer spending behaviour

Figure 2G — It looks a customers will spend 12 dollars in an average transaction, however the majority of transactions smaller with less than 5 dollars.
Figure 2H — Within the 30 days of the simulation, customers have spent on average a total of just over 100 dollars each.
Figure 2I — Using the metric of total transactions we can see how frequently a customer is making purchases. Most customers have made between 1–12 purchases during the period of simulation.

Data Pre-processing

A “trick” in the data is that an offer could have an ‘offer complete’ timestamp but this does not mean the customer was actually influenced by the offer — they may have only viewed it after spending money to complete the offer, or might have not seen the offer at all. We therefore need to determine whether the data in the ‘offer complete’ column is actually valid, by introducing a new column called ‘offer_success’. We consider the offer successful if the customer has viewed the offer before it is completed, and if they view it within the validity period.

This requires some special pivoting and manipulation of the data to get it into the right format. Something that looks like the figure below:

A customer may spend money, without actually knowing there was a special offer on what they purchased. We need to determine how much of their spend was actually influenced by the offer. If the customer spent money after viewing the offer, and within the validity period of that offer, we can assume that it was actually influenced by the offer. An influenced_spend column is introduced here to accommodate this.

We also calculate the time_to_view and time_to_complete metrics as part of this pre-processing phase.

For the clustering algorithm we need to look at customers who have spent at least some money over the 30 days and ensure their age and income is not missing. This leads us to a dataset that looks like this:

We are going to select Age, Total transactions and Income as the 3 features for the clustering model. The features are re-calculated to be normally distributed and scaled using sklearn’s StandardScaler.

Implementation

A fundamental step in clustering is to determine the optimal number of clusters for the data. We will do this by making use of the elbow method and the silhouette method.

Figure 3A — The Elbow method plots the distortion and inertia for each value of k. The point at the “elbow” of the plot represents the optimal number of clusters i.e. when distortion/inertia start decreasing linearly — in this case, the elbow is at k=3 clusters.

Next we will use the Silhouette method — which is a visual representation of how well our data points have been classified.

Figure 3B — Looking at the 4 plots above, I think that k= 3 looks the most ideal. All 3 clusters are above the average silhouette score of 0.29 and they are closer in size and thickness than the clusters in the other plots. There are also fewer data points that fall below zero.

For now, based on a combination of the elbow method and silhouette analysis, 3 clusters will be chosen to segment the customer base.

After fitting the model to the datapoints, we a a trend in the 3 clusters:

Cluster 0 represents older customers who earn moderately, and make purchases frequently
Cluster 1 represents younger customers who earn moderately and make purchases semi-frequently
Cluster 2 represents older customers who earn a lot but don’t make a lot of purchases

These cluster labels are merged with the other datasets so that we can start seeing results.

Refinement

The silhouette score should ideally be closer to 1 as this means the clusters are dense and well-separated. A silhouette score of 0.29 means there might be some overlapping clusters. A future improvement here could be selecting different features in the dataset to find more optimal clusters.

Results

A response rate is be calculated for each offer which is determined by the number of offers that were successful vs. the number of offers that were sent out.

Figure 4A: Customers across the 3 clusters respond more positively to discounts than BOGOs.
Starbucks should continue to aim their discount offers to all 3 groups
BOGOs are the least effective for Cluster 1. This could be because they are the lower income group and can’t afford to make large purchases. Starbucks could limit those offers, and rather push the discounts for cluster 1.
They should continue bogos for Cluster 0 and Cluster 2 as these are quite successful in older groups of people.

Figure 4B — Discounts are encouraging more spend than the other 2 offer types, especially in the higher income groups
The informational offers do not influence as much spend as the discounts and bogos — Starbucks could consider reducing the volumes of these types of offers

Next we will look at the effectiveness of various channels in adverting the offers. Since the channels are grouped together in the data, it is difficult to see the direct influence each channel has on how quickly an offer was viewed. We will therefore keep the channels grouped and draw insights from that.

Figure 4C — A combination of email, mobile, social and web is by far the most effective way of getting customers to respond to and complete offers. The remaining 3 combinations of channels show roughly the same response rate. We will need to dig deeper to see if this is dependant on offer type.

Figure 4D — For bogos, Starbucks should make use of as many channels as possible (ideally all 4). Bogos are not as effective at bringing in money as discounts, but if customers are more likely to see the offers, the response rate might improve.

Figure 4E — No surprise that the combination of all 4 channels leads to the best response for discounts. The inclusion of a ‘social’ channel provides a 20% improvement in response rate from just email, mobile and web.

Figure 4F — For informational offers, influenced spend is the only metric we can use to compare the channels
We see that there is not much of a difference in spend from using social or web in addition to email and mobile
Unlike the other 2 offer types, informational offers have not utilised all 4 channels — this could be a big factor in why the spend is lower for these offers — people may just not be seeing them in time

Figure 4G — Time to view shows us how where the most likely place is for customers to see an offer.
What we can see is that the more platforms are used, the more likely a customer will see the offer.
As expected, the offers that were sent with only an email and web had an average time_to_view of over 50 hours.
Adding mobile to that took the time_to_view to just under 50 hours.
The most effective method of getting views was including a social platform which took the time_to_view to about just over a day.

Figure 4H — Time to complete can be used to determine how effective the offer is at driving action from those who have viewed it, and how easy it is to go out and complete the offer once it has been viewed.
Email and web gets the customer to complete the offer in an average of 5 days. This is ineffective for offers with shorter validly periods
Adding mobile into the offer gets customers purchasing within 4 days of seeing the offer
Adding a social element is the most effective at getting the customer to make the purchase within about 3 days of seeing it.

Reflection

This project involved a lot of logical thinking in how to merge 3 distinct datasets in such a way that told a story and enabled us to draw insights into the customers, the offers and how to make those offers more successful.

Customers were grouped together based on an unsupervised machine learning technique — this required some additional research and learning about the elbow method and silhouette plots.

Once the customers were segmented it was easy to see which offers they responded best to and which offers successfully influenced them to spend money. Discounts are the most successful offers that Starbucks provides. They get the best response from customers and encourage the most spend regardless of the customer segment.

The method of presenting the offers also greatly affects which offers would be successful — the ones that can be viewed quickly and drive the customers to action are the ones that make you money. Advertising offers on social platforms was by far the most effective method and should not be overlooked.

Improvement

Gender is an important factor when it comes to rewards and spending behaviour. Unfortunately the clustering algorithm that was used is not able to consider categorical features such as gender. In future iterations of the this project, other algorithms and techniques could be incorporated which consider additional features.

We also only used 3 features in determining the clusters. Our silhouette score ended up being slightly on the lower end. There is plenty room for improvement here by selecting testing other combinations of features and obtaining a higher score.