Offer What and To Whom?

Starbuck Promotion User Behaviors Analysis

Yufei
7 min readAug 7, 2020

As a customer, we often receive offers from our subscribed brands, sometimes there is discount, sometimes not. Sometimes we click, sometimes not. Eventually, we think the offer is interesting and we buy.

On the other hand, Our behaviors may be analysed as users by the company!

Today I am going to analyse simulated data that mimics customer behaviours on the Starbucks rewards mobile app. In this data, Starbucks send 3 different offers to their customers in a time period: BOGO (buy one get one free), discount and informational offers that does not contain accounts at all. Those offers are sent to different customers at different times.

My objective is to find if there are some trends in the customer behaviours with regard to different offers in order to make offers more effective and targeted: what kind of customers are most interested in what offers?

Data overview

The data is composed of three tables:

portfolio.json

  • id (string) — offer id
  • offer_type (string) — type of offer ie BOGO, discount, informational
  • difficulty (int) — minimum required spend to complete an offer
  • reward (int) — reward given for completing an offer
  • duration (int) — time for offer to be open, in days
  • channels (list of strings)

profile.json

  • age (int) — age of the customer
  • became_member_on (int) — date when customer created an app account
  • gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
  • id (str) — customer id
  • income (float) — customer’s income

transcript.json

  • event (str) — record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) — customer id
  • time (int) — time in hours since start of test. The data begins at time t=0
  • value — (dict of strings) — either an offer id or transaction amount depending on the record

The most tricky part is the transcript table, which corresponds to each event happened with the time passes by. Also, when each offer is sent, there is a valid period that may impact or not the following behaviour in the timeline.

So this table is not intuitive at all. We need to do some wrangling to find clues for implement the ML model.

Data Wrangling

The data is extremely messy as there are informations about offers, information about users, and also detailed information about each event: if it ‘offer received’, then the info is about the offer id, if it is a transaction, then the value is about the amount.

To make things simplified, my focus is on the user behaviour: Predict whether or not a user will complete an offer if he or she sees the bogo or discount offer, or make a transaction after an informational offer.

My method is to forward fill the ‘offer completed’ and ‘transaction’ events where there was no offer_id affected with conditions of user has received and seen the offer.

After wrangling, below is the table of events after offers have been received once per user.

Data Visualisation & Findings

From above table, we understand that we want to predict binary labels with a number of variables. The variables have been put into numeric or in form of dummy variables.

Each line is an event.

We need to make target label 0 and 1 in order to implement a machine learning classifier. For me, logistic regression or random forest can fit in this kind of situation.

Bogo Offer:

For Bogo offer, y is number of 1 offer completed compared to 1 offer non complete for the same age. From about age 40, the ratio is over 1.0.

The age variable is not very distinguish in seperating offer complete rate. We can see that there are 2 small peak: age group of 50s and 70s and plus. Young people of less than 35 are less interested.

With regard to income, we can see that there is a clear correlation as income increases, the offer complete rate increases too.

Seniority is the most important variable, we can see that by the feature importance table in the latter. The starting of seniority is based on the lastest inscription of memember in the transcript table, which is the 26 July 2018. Customers with seniority between 400 days and 1100 days are the most likely to be attracted by those offers, with a more dense focus between 400 and around 800 days.

Discount Offer:

Age: 3 slight peaks observed: around 35 years old, bewteen 50 and 65 years old, between 75 and 85

Income: Generaly the trend increase with the income goes up. Peaks are for the revenue between 70000 and 85000, and over 100000.

Seniority: Higher offer complete rates are shown between seniority of about 400 days and 1100 days with a density in between 700 days and 900 days.

Informational Offer:

Age: The transaction rate decreases with age with peak for ages between 20 and 30.

Income: transaction rate decreases with revenue

Seniority: New inscriptions show more interest than in previous offers. People between 250 and 1100 days of seniority are most interested. Slightly more offer completed for seniority between 600 days and 850 days than in other groups.

Random Forest Model Implementation

As we are trying to predict the ‘offer completed’ (transaction for info offer) 0 or 1, this is a typical binary classifier that we will implement. After comparing the performance of logistic regression and random forest classifier, I found that the performance of random forest classifier is a bit better (6.6 vs 6.3 Roc Curve for Bogo Offer).

# Split, train and fit the model for bogo offerX = bogo_offer.drop([‘transaction’,’offer_type’,’offer completed’] , axis=1)y = bogo_offer[‘offer completed’]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)clf = RandomForestClassifier(max_depth=10, n_estimators=1200,random_state=0)re = clf.fit(X_train, y_train)preds_b = re.predict(X_test)

Bogo Offer confusion_matrix & Classification_report

Discount Offer confusion_matrix & Classification_report

Informational Offer confusion_matrix & Classification_report

Above confusion tables and calssification reports show a ROC of around 0.66, 0.65 and 0.58. With GridResearch, we may increase the performance lightly.

Conclusion

From my analyse, the most influencial factor for 3 offers is the seniority. Those of seniority of 400 days and 1100 days are most attracted by three offers, with slight divergences bewteen offers: Those of seniority of 250 days and 400 days are also interested in informational offers, seniority between 400 and 800 days are most interested in Bogo offers while for discount offer the most interested are between 700 and 900 days.

Income and age are also important variables, but the trends are different: Lower income and ages tend to be more interested in informational offers while for bogo and discount offer the correlation seems to be inverse. To segment the sending of offers, we may send offers based on the different charactors and find the most attractive offer(s) for each user. For exemple, with a young user of 300 days of seniority, we know that informational offer is the most influential.

With regard to the sex, more women are interested in bogo and discount offers than man and other sex. More men are interested in informational offers than women and other sex. But the sex variable remains less important among the other features.

The random forest shows statistcs significance. We may try different parameters using GridSearch if we want to find better performance of models.

--

--

Yufei

Country Manager France of an established Asian independent distributor. Let's connect: www.linkedin.com/in/yufeige