Exploratory Data Analysis to understand consumer behaviour

In the following post, we will be exploring the dataset for a fictitious meal kit company, Apprentice Chef, and deducing key insights and actionable for the company in an effort to increase the Revenue.

Photo by Ella Olsson on Unsplash

Apprentice Chef is a gem for those seeking a convenient and healthy meal option. Unlike its counterparts, Apprentice Chef comes with award-winning disposable cookware, which sets it apart & may seem like a go-to alternative for the environmentally conscious. It has meals ranging from USD 10 to USD 23, much like its competitors, both online & offline. A typical Apprentice Chef meal, consisting of proteins, vegetables, carbs, and seasoning, takes about 30 minutes to prepare. The meal-kit comes with a step-by-step, detailed description of a gourmet recipe cherished by a novice & veteran chef alike.

The convenience of the meals doesn’t keep it from being an economical, sustainable, and a gastronomical delight.

Understanding the dataset

Variables at our disposal

We have thirty variables that shed light on the performance of the company across functions. To understand how each feature affects the Revenue, we must group the variables based on what they highlight about the consumer. The broad categories can be:

Demographics: Variables which give demographics of the consumer
Affinity: Variables which directly impact Revenue like purchases made
Behavior: Variables which highlight the behavior of the consumer like enrolments
Online Behaviour: Variables which help understand the consumer’s online behavior and engagement like clicks
Convenience: Variables which offer convenience to the consumer like the presence of a refrigerated locker
Inconvenience: Variables which indicate inconvenience caused to the consumer like late deliveries


Understanding consumer behavior is a multifaceted process. To simplify this, we can segment the consumers based on:
1. Rating: Five segments for every star of the rating given.
2. Revenue: Quartiles based on the average Revenue generated.

Exploratory Data Analysis (EDA)

The primary objective of the EDA will be to understand which features impact the revenue.

Dissecting consumer behavior will involve looking at the given variables in isolation and also their relation to other relevant variables. The following analysis will be carried out for the dataset:

Univariate Analysis: Will be performed to understand the range, dispersion, and outliers of individual variables. The segments can be color-coded into the plots to add more meaning to univariate analysis.
Graphs to be plotted: Histogram, FacetGrid, Boxplot

Bivariate & Multivariate Analysis: It’s imperative to understand correlation between variables while inferring business dependencies.
Graphs to be plotted: Pairplot, HeatMap, Scatterplot with three variables (lm plot)

Key inferences

There is a definite difference in behavior between customers who have rated favorably and the ones who haven’t. Every business touchpoint is an opportunity for the brand to elevate consumer experience and thus influence the rating given.

Consumers who have rated the brand favorably have far fewer clicks.

It can be assumed that those who have rated the service favorably have had to take lesser effort (effort is equal to clicks on the site). The lower rating could also be a consequence of the consumer not being able to find a meal of their choice despite spending a lot of time on the site.

We cannot conclusively infer what causes the other without further qualitative research.

The created segments behave differently only for variables with a high correlation with Revenue.

It is no secret that few variables will have a more direct impact on the Revenue. Within our Affinity Variables, some have high correlation like Largest Meals Ordered & Total Meals ordered as compared to others like Weekly Plans. It is interesting to note that our created segments (both Revenue & Ratings) behave differently for the variables which are highly correlated with Revenue. The same cannot be said for less correlated variables. The Pairplot below helps us view this at a glance:

On further investigation with facet grids and box plots, we discover that the behavior of consumers within the created segments has disparity ie, the data within these segments have variation within their range, dispersion, and outliers. Along with understanding the variables in isolation, it is essential to see it from this lens to understand consumer behavior effectively.


Segmentation of customers for improving retention efforts

Retention of customers is one of the biggest challenges pertaining to increasing revenue faced by the meal kit industry today. With the EDA, it is established that segmented customers behave differently with the business. The retention rate within these segments is bound to differ, too ie, some customers may be far easy to retain over others. Within the ones who are hard to retain, the ones with high attribution to Revenue need immediate attention. The existing data needs to be enriched with recency and frequency of meal purchases to deduce more sophisticated segments.



Prajakta S Parkar
A Slacker’s Guide to Data Analysis

A chubby, tall, curious and intuitive marketeer who recently developed a penchant for data science and behavioural economics