Starbucks: Recommending for Your Wishs

Gutelvam Rodrigues
The Startup
Published in
13 min readJan 24, 2021

How Can Starbucks Improve my Experience and Keep me engaged?

For many, it is simple! Starbucks for the simple fact of existing, is already something big. The company is concerned with innovating and providing the best experience for its customers, having a pleasant way of engagement and reward for those who buy its products.

A brief Introduction

Do you know that delicious aroma of warm coffee, ambient lighting and the famous green symbol on all glasses? If you thought about the Starbucks brand, you nailed it. The company has become synonymous with profitability for investors, quality and excellent customer service.

In 1992, the brand was already known as the largest chain of coffee shops in the world, with about 20 thousand establishments spread over several countries. Over the years and increasing visibility, Starbucks has gained a loyal following. All this thanks to its differentials that many competitors did not have.

The big secret is to offer exactly what people would like to receive when they go to enjoy a coffee or another drink. And that Starbucks does very well. Starbucks franchises have an extremely cozy, comfortable, well-ventilated environment, with custom lighting and, of course, internet connection.

Understanding the case

From the data provided for this project by Udacity in partnership with Starbucks, the established objective is to analyze how customers respond to offers such as Discount and BOGO(Buy One Get One). Keep in mind that in this type of incentives, not everyone completes the purchase when receiving an offer, whether by email, mobile, web or social media.

Why does it happen? Many things can impact the purchase decision of a product, for example if the customer receives the offer on a particular day of the week he tends to buy, or a specific group based on demographic information tend to purchase more, and so on.

To start analyzing it is very important to establish a clear objective, in this case I had the freedom to choose, and i decided to create a system for recommending offers. To be able to complete this task, it was necessary to raise questions to be answered with the data based on the established objective.

Typically in recommendation systems there are four types of approach, namely: Content-based Recommender Systems, Collaborative Filtering, Knowledge-based Recommender Systems and Hybrid Recommender Systems.

For this particular case the main ideia is to use Collaborative Filtering because this approach allows a better fit for users with a history data, since we can compare and find similarity between users theyre taste and with this a better customization for customers. To carry out the project or intention is to apply the Surprise Sci-Kit library, as it makes it possible to perform the test with different models in a quick way.

In this challenge we are going to compare all this models, and find which fits best with our data, discription of methods was found and borrowed from surprise documentation as following:

  • SVD — “The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization.”
  • SVDpp — “The SVD++ algorithm, an extension of SVD taking into account implicit ratings.”
  • KNNWithZScore — “A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.”
  • KNNWithMeans— “A basic collaborative filtering algorithm, taking into account the mean ratings of each user.”
  • CoClustering— “A collaborative filtering algorithm based on co-clustering. This is a straightforward implementation of [George:2005].”

The challenge here is to create the data as the model expects, a table with user records x products with value rates, that is, how will we create this table? Continue reading that we will see together.

Defining question to be asnwer in Data Analysis

For the purpose of analyzing the data I chose the following questions to be answered with the data:

- Which genre is more likely to complete an offer?

- Which gender has highest salaries?

- Which kind of offer has more duration?

- How many users has completed offers?

Data Iformation

The datasets that we have and we are going to read, clean, and analyze. Here is the schema and explanation of each variable in the files:

portfolio.json

  • id (string) — offer id
  • offer_type (string) — type of offer ie BOGO, discount, informational
  • difficulty (int) — minimum required spend to complete an offer
  • reward (int) — reward given for completing an offer
  • duration (int) — time for offer to be open, in days
  • channels (list of strings)

profile.json

  • age (int) — age of the customer
  • became_member_on (int) — date when customer created an app account
  • gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
  • id (str) — customer id
  • income (float) — customer’s income

transcript.json

  • event (str) — record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) — customer id
  • time (int) — time in hours since start of test. The data begins at time t=0
  • value — (dict of strings) — either an offer id or transaction amount depending on the record

Cleaning and Exploratory Data Analysis

The first step was to read the data and look for basic information about the datasets.

Profile Dataset

This is an important step, now that we know the data size, I checked how many missing values ​​are in each dataset but there was missing data only in the Profile dataset, according to the following result:

Looking more closely at the data, I noticed that the missing data for the gender and salary column had the age column with something that was probably not right, registration of consumers aged 118 years, does not seem very common to me. All records with this anomaly have missing values ​​in both columns mentioned (Gender and Income), so the decision was to remove these records.

Still on the dataset in question, the column became_member_on is of the date formated as int64 type so I created another column, inferring that the consumer is still a customer until the present day. To facilitate the futher analyzes, the period was transformed into days according to the function below:

So with cleaned Profile dataset we may ask how is the distribution of customer ages? Looks like good amout of the records are between 40 and 80 years old, intresting isn’t it? Some question can be at your mind like “younger people don’t like starbucks?”. We can’t really know with this data, because only people over 18 are taken into account, probably who have an active account. To analyze people under 18, other studies would be necessary, with more information.

Regarding the salary it seems that the majority is distributed below $ 80,000 a year, this is interesting to look at in terms of age, older people tend to have higher salaries. Totally acceptable since it has more career time.

Fantastic! In this dataset when looking for income by gender, the normal thing to expect is male to be the highest, but it’s realy nice to see that the world is changing to equitity.

Ok! and who is more likely to complete an Offer? Like the graph below, looks intresting, male gender seens to complete a lil more than female for this sample.

How many people did complete offers?

That’s good! From 17k we have 11986.

Cool, cool cool!

Portfolio Dataset

OK! Now it’s time to move on and investigate other dataset. Let’s see what the Portfolio can tell us.

“I believe that here we must look carefully, since it’s where our main goal is based”.

When looking at the graph above, it is possible to notice that there is no reward program with informational offers, in the same way it is well known that offers of the BOGO type tends to give more rewards than that of the discount type.

“For me here it makes a lot of sense since an informational email tries to leave users informed about new products, or some institutional action, but surprised me the fact that is a difference between BOGO and Discount i didn’t knew about this”.

Next let’s look at the duration of offers, does the discount last longer than BOGO?

“Well discount offers tend to last longer than BOGO offers and informational, probably that’s why it has less reward associated in relation with BOGO.”

This dataset has some anomalies that can be noted above, the channel column seens to have values dispose like list. To transform this collumn in a usefull data it’s necessary to clean that column. Therefore the function below was used to clean the data.

Transcript Dataset

Now it’s time to analyze our last data set, the transcript data set. In this data all user records are presented, when the customer received an offer, when he bought and so on, you can think of here as a Log of each user. In many cases the user received the offer but did not view it, in others the user received and viewed but did not purchase and the last case completed the offer, this means that he bought after receiving and viewing.

In this case the unique variable to look at, is the distribution of event type, we can see below that about 20% of completed offers in relation to transactions, analyzing offer completed is clear that most of transaction was a natural process and wasn’t influenced with offers.

As you can see in the event column it has the status of the offer, and in the value column it has dictionaries with the offer id if the user received it, and even if it is a transactional type event (purchase) the values is amout (quantity that was transacted). To clean this dataset is simple, we must get all offer id, inside value column and keep only events linked to “offer received”, “offer viewed” and “offer completed”, to do this was used the function below:

Prepare Data for Machine Learning Model Using Sci-kit Surprise!

First, to prepare our data we must change the format of data and create a table like this format:

This is where the secret is, the idea is simple, we need to categorize what is positive feeling in relation to an offer and what is not. In my case I stipulated that if a person receives an offer, views it and completes it, then that person was impacted by this offer (receiving the value 1). If the person receives and views and does not complete, an offer had no effect (receiving the value 0). Finally, in the cases that you receive and do not see or do not receive, they are missing values ​​(receiving NaN). For this purpose, the function below was used.

The surprise lib cannot handle NaN values, that’s why we must remove all missing data.

Defining Metrics

For a given offer and customer we are willing to predict how will be his feeling about the offer, and we would recommend them which offer fits better for positive feeling.

The given problem is a Recommendation problem It can also be seen as a Regression problem. In this way, the regression metrics are the best for this type of situation, that’s why our goal here is to Minimize RMSE since it’s the most common metric used.

RMSE (root mean square error): it is the measure that calculates “the mean square root” of the errors between observed values ​​(reals) and predictions (hypotheses).

Finding best model with Cross Validation

With the data in hand and properly converted to a Surprise object, a cross-validation was carried out to find the best model that fits this data:

With this the model that had the best performance was the SVD with RMSE of 0.7991. This cross-validation is a built-in method of the library, so to try to challenge the presented value, a search for better hyperparameters with the SVD model was carried out.

Model Evaluation and Validation

To get this result the model was trained with the best params that we got from Hyperparameter Tuning as following:

{'n_factors': 4, 'n_epochs': 15, 'lr_all': 0.002, 'reg_all': 0.02}

We can make predictions for 8012 common user, while for 6132 user we cannot due to cold start problem, they are not presented in both sets simultaneously, this corresponde to 54.04% of the records.

Justification

Why does our result was better with SVD? Probly beacouse of sparseness of matrix created by funkSVD method that is not sensitivy to NaN values. By the way, the model has a huge problem of cold-start, probably we must to use more than one approach if we want this system improved, we must use an hybrid system adding Content-based Recommender Systems, and Knowledge-based Recommender Systems to fix cold-start problem.

This model seens to be a good start for a recommendation system, the RMSE of test set didn’t have a significant improve on the model in relation to the Cross-validation already available by the library itself, this means that even with different parameters the model is robust against small perturbations in the training data and the RMSE at this data seens to be good to be applyed in real data.

Predicting

Now that we have our model and our predictions, let’s see how it works to get predict information. First of all, you need to know that the surprise does not give us the recommendation itself, but returns the most similar users. In this way, it is necessary to compare the offers based on neighborhood with closer distances. How shall we do? Here is an exemple:

This is how you get for an especific user:

Oh my gosh!! We made it!

Findings

  • Event type, about 20% of completed offers was probably a natural process and wasn’t influenced with offers.
  • About 11986 people has completed offers its around 80% from records.
  • The majority salary of customers are distributed below $ 80,000 a year.
  • Woman seens to have higher income than male gender for this sample.
  • Discount offers tend to last longer than BOGO offers and informational.
  • BOGO offers tends to get more rewards.

Conclusions

This project was a huge challenge, to implement the model at hand it was very difficult in particular to make conditions when creating a base dataframe, once the processing time was the Gap, after that the understanding of the surprise library took some time. What really made me very happy was the know that the model can be improved by adding other approachs and that we can get a model realy fast, the RMSE of 0.799192 we were able to get with default cross-validation in SVD model, thus we could get better results with GridSearchCV, 0.79387 of RMSE. This made me so happy because it was the metric chosen.

Of all the models used to create the recommendation system, it was possible to notice that the SVD had the best performance using cross-validation with the surprise library. The surprise librabry is a goood choice, which is very interesting for fast projects due to its similarity with the sci-kit learn, easily usable and implementable.

However, it is possible to realize that he has his shortcomings and difficulties, the recommendation returns the most similar users and it is necessary to carry out a further treatment to find the recommendations, another point is that many users had few recommendations this is due to the low amount of rates user or even the lack of records for this user. On the other hand, when implementing a model manually, you are free to make changes and create a more generalizable model, combining several approaches in one system.

Prossible model improvement:

- For unknow users must be implemented Ranked based approach or content based.

- By decreasing data sparsity, since there are few records in which there is actually a positive or negative connotation.

- Use hyperparameter tuning with gridSerchCV with more time (increase the number of options).

- Clustering by gender and age, before training the model

- A / B testing to improve the impact of recommendations

Next steps:

This work can only be a gateway for the implementation of other prediction systems such as:

- Forecast the amount of financial return related to advertising

- More effective grouping of advertisements according to gender and age

- Inventory forecasting

Summary

  • Performed an exploratory data analysis understanding the profile of customers and how they’re linked to offers.
  • Preprocessed the data to ensure it was appropriate for the predictive algorithms.
  • Used various models to with Cross Validation to find wich one fits better and the user.
  • Created a Simple Recomendation System With Surprise!

- — So How Can Starbucks Improve my Experience and Keep me engaged? — -

The system has already taken a step in making recommendations of interest to you, the chance to keep you engaged and much more besides feeling more privileged with the premium treatment that the brand already offers.

This project was part of Data Scientist Nanodegree program created by Udacity. To see more about this analysis with Surprise! click here.

--

--