A Data Science Approach to Alcohol Retail Sales

Master Class: Analytics in Action
Columbia Engineering & Business School in Conjunction with 3x3 Insights

Introduction

This project is a collaboration between the Columbia Business School, Engineering School and 3x3 Insights, a company focused on increasing alcohol sales for “corner” retail stores and brands in the US. The goal of this project gets a better understanding of alcohol buyers in order to generate actionable insights for the retailers and brands to improve sales and Return On Ads Spending.

Data and Approach

We used 8 gigabytes of Point-of-Sales data spanning from September 2018 through August 2019. Starting by exploratory analysis, we detected and cleaned out data anomalies. We then define potential features for the clustering model by using data visualization and histogram. Those potential features were selected by using Principal Component Analysis and analyzed by the K-mean clustering model, Linear Regression, and Random Forest.

K-mean clustering model

The K-mean clustering model using 13 behavioral features as an input indicated that there are 7 clusters of customers.

Alcohol brands and retailers should never offer a wine discount before Thanksgiving

The festivity wine buyer spent the second most out of 7 groups of alcohol buyers regarding purchasing behavior. Even though there are 3 groups of wine lovers, the festivity wine buyer outspent the other two groups. They usually buy expensive wine a few days before festival holidays such as Thanksgiving, Christmas, and Independence Day. Interestingly, this group bought alcohol from many retail stores. It could be that they are not price-sensitive so that they bought wines from any stores then went to or they are very selective in picking premium wines so they went to many retail stores. The graph below compares the characteristics of festivity buyers and normal wine buyers.

​To capture these festivity wine buyers, alcohol brands should offer limited edition or premium wines right before festival holidays. Also, they can increase spending per basket by bundling these premium wines with other premium products and offering them as a gift set.

Wine retailers in office areas should open in the morning

Surprisingly, one group of customers bought wines in the morning. We found that 60% of their transactions occurred in the morning and they usually bought wines, spending on average 40% higher than normal wine buyers. It could be that this group of customers is a B2B buyer. They bought those wines as a gift for their customers. It is also possible that they want to give it when they visit their customers later in the day. Thus, the average spending is more than normal wine buyers and they are price insensitive.

Wine retailers near office areas should open in the morning to capture this high spending power customers. Wine brands can also advertise expensive brands to this customer group and deliver wines with cool packaging.

Alcohol brands should advertise cheap alcohol during sports events

Night buyers group bought a wide variety of cheap products especially prior to sport events. They have the lowest average spending among all clusters, and they bought heavily after 7 pm especially before sport events. Below is a graph comparing night buyers with all other clusters.

To capture these night buyers, alcohol retailers should open at night during sports events. This group is price sensitive, offering an aggressive discount to them on deadstock products. Alcohol brands can also offer cheap alcohol products and advertise them during sports events.

Regression/ Random Forest

After success with the K-Means, we also wanted to study the purchasing behaviors of customers for a specific brand. Thus, we ran random forest and linear regression models predicting the wallet share of a specific brand for each customer. By analyzing which features maintained the lowest p-values we were able to identify which features were the most significant for brands including Corona.

Corona is less popular among beer lovers despite being a very popular beer

One of the brands which we analyzed using regression was Corona given its ubiquity and transaction frequency. We found that although across all customers Corona is preferred at a rate above average (compared with other beers) however, amongst beer drinkers Corona fares poorly. In other words, regular beer drinkers do not particularly like Corona (on average) probably because they like to drink diverse brands of beers. Additionally, Corona is disproportionately purchased in the summer months and not purchased in the winter months. Lastly, we found that Corona is typical for nighttime purchases. This fact emphasizes what we discover about night buyers before.

Corona should refrain from sending advertising to beer lovers. And, they should be prepared to produce a bulk of Corona in Summer and produce less in Winter.

Conclusion

With Point-of-Sales data and collaborative data analytics project, the alcohol brands, instead of putting budget to the broad online marketing tool and waiting for someone to buy their products, could proactively target specific group of customers with the cluster from the clustering model and customize their products and marketing campaigns with the important features from Regression and Random Forest Analysis.

--

--