Starbucks: Exploratory Data Analysis on Mobile App Data from Starbucks
Analysis of Starbucks mobile data to understand customer behavior to increase effectiveness of offers
Introduction
Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. The dataset we are going to analyse contains simulated data that mimics customer behavior on the Starbucks rewards mobile app.
Not all users receive the same offer, and that is the challenge we are going to solve today.
You can find the complete the code for this project on the link below.
This post will be divided in 3 parts:
- Business Questions
- Data Analysis and Cleaning
- Answers
- Conclusion
- Future Improvements
1. Business Questions
We will try to answer following business questions:
- How much we loss because of the offers?
- What kind of customers that often completed the offer without viewing it?
- How is the income distributes between customers type?
2. Data Analysis and Cleaning
Lets start some data cleaning and visualization.
We have 3 data files:
The data is contained in three files:
- portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
- profile.json — demographic data for each customer
- transcript.json — records for transactions, offers received, offers viewed, and offers completed
A. Portfolio DataFrame:
Portfolio contains a reference list of all promotion categories and sub-types offered by Starbucks.
This data set has 5 columns and 17,000 rows. The columns are as follows :
- age (int) : age of the customer.
- became_member_on (int) : date when customer created an app account.
- gender (str) : gender of the customer (M : Male, F : Female, O : Others).
- id (str) : customer id.
- income (float) : customer’s income.
Which further needs to be converted in following format to make more useful for data analysis.
Steps to Clean this portfolio dataframe:
- Rename id column name to offer_id.
- The channels columns require to be one-hot encoded.
- The offer_type columns require to be one-hot encoded.
B. profile dataframe:
- The data set has no duplicated rows.
- The data set has 2175 missing values on each of: ‘gender’, ’income’ variables.
- The customers ages range from 18 to 101. Although that 2175 customers were registered at age 118 but I still considered this specific age an outlier b/c it appears clearly that there is something wrong related with these 2175 rows in the data set.
Transcript records a customer’s transaction history, the numbers and sub-types of single offers sent to a customer, and provides data needed to analyze individual customer behavior. Transcript data was collected through the course of one month.
Steps to clean this dataframe:
- Replace age value 118 to nan.
- Create a readable date format in became_member_on column
- Drop rows with no gender, income, age data
- Convert gender column values to numeric 0s and 1s
- Add start year and start month columns (for further analysis)
This shows few unusual values like 118 which are highly unlikely, we will replace it with the median value.
C: transcript dataframe:
This data set has 4 columns and 306,534 rows. The columns are as follows :
- event (str) : record description (i.e. transaction, offer received, offer viewed, etc.)
- person (str) : customer id.
- time (int) : time in hours since start of test.
- value (dict of strings) : it can hold the values of ‘offer id’,’amount’,’reward’ and/or ‘difficulty’.
- Rename person column name to customer_id.
Steps to Clean this dataframe:
- Extract information from column “Value” to create new columns denoting offers, reward, and amount of the gift voucher.
- Fill NA values with 0
3. Answers to the Business Questions:
I. How much was the loss because of offers?
From above visualisation, we get the numbers:
Discount Total: 5,391 Loss: USD 17,802
BOGO Total: 4,616 Loss: USD 31,230
There are 8 offers, and most of the loss are from BOGO offer.
II. Which customers have tendency to complete offer without viewing it?
The visualization also confirms our previous assumption about Female customers, it shows that the average spending per transaction for Female is higher than Male and Others, with the average of USD 16,3 per transaction.
III. Relation between average income and customers type
If we take a look at the visualization, there is not much differences between customers who complete the offer without viewing it or not. But overall, customers who complete the offer without viewing the offer first have the higher average income.
It is quite intuitive because we always assume people who have higher income care less about the offer (well, they have more money anyway so they can afford it).
4. Conclusion
Based on the analysis, there are several things we can conclude.
- With the unplanned offer, we can “loss” up to USD 49,032 of revenue in a month or USD 588,384 of revenue in a year. So the target marketing of our promo is very important and plays a huge roll.
- Female customers tend to spend more than Male customers, with the average spending per transaction is USD 16,3 compared to USD 10,4 respectively. Female customers also have tendency to complete the offer even without viewing it first, so we might want to be more careful in sending the offer to them.
- In overall, customers who complete the offer without viewing the offer first have the higher average income, especially in discount offer where those who complete the offer without viewing it and those who viewed it have average income USD 71,060 and USD 67,642 respectively.
5. Future Improvements
These are the things recommended for future work based on the data analysis result:
- We need to be more careful in sending the offer, especially the BOGO offer where it contributes USD 31,230 loss in this experiment. One thing we can do is to stop giving the BOGO offer to the customers with the average purchase > 2 cups per transaction, because without giving them the offer they tend to purchase > 2 cups anyway so the BOGO offer seems not to important for them.
- Send less offer to the Female customers, especially discount offer. We can see from the data that the average spending of Female customers is USD 16,3. So we might want to increase the minimum spending for the offer we send to them, because it won’t make sense if we send them the offer with “difficulty” of USD 10, they would accomplish it anyway. So increase the minimum purchase to USD 20 or USD 25 would be better.
- We might want to customize the “difficulty” based on the level of income for each customer, so that people with the higher income have the higher “difficulty” as well.