Snack Stats: Conducting EDA & Hypothesis Testing to Uncover How Health and Wellness Consumers Snack

Published in

SnackNation Engineering & Data Science

9 min readFeb 7, 2019

Whenever I tell someone that I’m a Data Analyst at a snack company, their first response is usually along the lines of, “Oh, that’s cool… but what data do you actually have?” Looking at our company as an outsider, one could easily assume that we’re simply selling and shipping out snacks. But the truth about SnackNation is that we do so much more.

Our CEO, Sean Kelly, has described SnackNation as a tech-driven data company disguised as a snack company. Our business is twofold: we provide our customers with thoughtful and healthy snack curations, and in return they have the option to review the products they tried and provide us with insight on what they liked or didn’t like about the snacks. The snack brands we partner with are usually smaller emerging brands that benefit greatly from the data we collect and share from our customers.

The original question still stands though: what data DO we even have? And what can we learn from that data? I’m going to give you a glimpse into some general insights we can gather from our health and wellness consumers all across the country and how we use exploratory data analysis (EDA) and hypothesis testing to drive our business and to learn about our snacks, customers, and brand partners.

How do we get our data?

SnackNation offers a variety of snack subscription options for both office (B2B) and home (B2C) customers. Our customers are health and wellness consumers from all across the country that are eager to try new, cleanly-sourced snacks that are hitting the market. To drive the product insights side of our business, we ask our customers to fill out surveys about the snacks they received every month. Questions on these surveys range from ‘Have you heard of this product before?’ to ‘Do you eat our snacks while you’re out shopping?’. To incentivize our customers to provide us with this information, we’ve implemented various reward programs. One such program is Dollar Snack Club, which provides customers with discounted $1 snack boxes every month, conditional on them filling out our product surveys.

We’ve been collecting this data from our customers for years now. For this analysis, I pulled millions of data points from two years of our most recent survey data. Over the years, we’ve made many changes to our survey formats and the questions asked. These changes force the Business Intelligence team to spend an enormous amount of time compiling and making sense of this data every month.

Cleaning Our Data

For the EDA and hypothesis testing I performed, I took steps to clean and standardize the survey data in a Jupyter Notebook using the pandas library. Some major steps I took to clean the data were:

Consolidate duplicate columns. As mentioned earlier, we’ve been collecting surveys for many years now and haven’t always kept a consistent survey question format. I had to consolidate duplicate fields that had the same data but were worded differently in different survey months (the .fillna() function was my best friend in this process).
Keep records only if the Product Rating and Purchase Intent fields are not null. Since the goal of my analysis was to focus on product feedback trends among different demographic groups, I removed all records that did not have the Product Rating and Purchase Intent fields populated (using the pandas notnull() function). We can still pull useful information from these deleted records for other analyses we perform as a company, but it made sense to remove them for this analysis.
Keep records only if the product being reviewed has 30 or more reviews total. I didn’t want to skew average ratings by including snacks with too few reviews. It’s a general rule of thumb that 30 records distinguishes a ‘large’ sample size from a ‘small’ sample size. This comes into play in hypothesis testing, which I will discuss in further detail later in this article.
Populate previously empty fields for long-time customers. We have plenty of customers that have stayed with us through the many iterations of our surveys. Some of the questions we ask in our current surveys were not included in our earlier surveys. Since we have unique customer emails available to us, I chose to fill in previously empty data points for some of our newer survey questions based on a customer’s most recent response. I only did this for activity fields and two demographic fields since I decided that most other customer characteristics would be too variable over the 24 month period being studied (for example, a customer’s age range can change, so I didn’t want to incorrectly populate older records with this new data).
Remove unnecessary fields. Many of the data points we collect from our customers are personal details related to their accounts or whether they’ve opted into our monthly giveaways. Superfluous data points like these were removed and I only kept fields related to:

Ratings (packaging ratings, product ratings, purchase intent)
Demographics (gender, education level, etc.)
Social media usage (Instagram user, Twitter user, etc.)
Activities (hiking, yoga, etc.)

Data cleaning can be a tedious process, but all of the steps completed above ensured that my analysis would be accurate and reliable. After all the data cleaning was completed, our reviews each contained data relevant to this analysis and all products fell into one of these snack categories:

Data Distributions

The purpose of this study was to see what kinds of general trends and insights can be gained from our product review data. Before performing any kind of hypothesis testing, I did some exploratory analysis to see what our data looked like and how it was distributed.

As I mentioned before, the Product Rating field is one of our key review metrics. One aspect of this study that I recognized off the bat was that reviews could only be discrete values 1 through 5. Naturally, this won’t give us a great distribution for testing what kinds of products our customers like (see the first chart below). The key to this analysis was to look at average ratings by product. These were the results:

The two vertical blue lines in the second and third graphs represent the mean and the median for average product ratings. The first thing that should stand out is that the average ratings are skewed higher than the normal fit line. If our data wasn’t skewed, our mean and median lines would be overlapping. We therefore can assume that our data is not normally distributed.

But what about some of our other review metrics? Another important rating our customers provide us with is Purchase Intent. Purchase Intent measures a customer’s likelihood to go out and buy the product on the market. This brings us to an important question: does it matter if product is highly rated if a customer doesn’t want to go out and buy it? This implores us to consider the importance of this metric in our analysis.

This is how our purchase intent data was distributed:

We can see that people are less willing to go out and actually buy what they’re reviewing by the more even distribution of 1 through 5 Purchase Intent ratings. But we should also recognize in the third chart that peoples’ product ratings tend to align closely with how they rate their purchase intent.

So how do we integrate these two important metrics into our analysis? They don’t paint a full picture of customer sentiment separately, so I chose to blend the two measures in a composite rating in which each of the two ratings is weighted. This was the resulting distribution:

It’s still not perfect, but we can state that our average composite ratings by product are approximately normally distributed.

Hypothesis Testing

To study how different types of respondents’ snacking preferences differ, I used two-sample t-testing. Specifically, I wanted to measure the differences in average product composite ratings between various populations. As I mentioned before, I pulled in data points related to our respondents’ demographics, social media usage, and activities. Based on these responses, I split up populations by whatever variable I was studying for that test. I then tested whether the difference in their average composite ratings was statistically significant at the 95% confidence level (p-value < 0.05).

As an example, I’ll explain how I studied the difference in means for male and female average composite ratings for our Popcorn & Puff snacks. By looking at a summary table, I could see that, on average, women rated our Popcorn & Puff products higher than men did. But was this difference statistically significant? To find out, I first stated the null and alternative hypotheses as such:

H0: There is no significant difference in the average composite rating for Popcorn & Puff snacks between Men and Women
H1: There is a significant difference in the average composite rating for Popcorn & Puff snacks between Men and Women

After splitting up the average composite ratings into two tables (one for women and one for men, both containing only Popcorn & Puff ratings), I ran the two tables through a function I wrote that calculated the critical value and whether or not to accept or reject the null hypothesis based on the resulting p-value. The SciPy library conveniently has a function ttest_ind() that makes this calculation quick and easy. For the test above, the resulting p-value was less than 0.05. Based on this result, I rejected the null hypothesis and accepted the alternative hypothesis: there is a statistically significant difference in the average composite rating for Popcorn & Puff snacks between Men and Women.

The Fun Part: How Our Customers Snack

Based on a plethora of hypothesis tests I ran, I gained some insight into how our customers like to snack. Below is a sample of what I discovered. As a reminder, all of these results should be interpreted in terms of average composite ratings. When I say that a certain population prefers a certain type of snack, it simply means that, on average, they rate products in that category higher than other populations do.

Note: all results are statistically significant at the 95% confidence level

Conclusions

The results discussed above are just the tip of the iceberg for the types of insights we can collect from our data. On a more granular level, we’re able to provide our brand partners with specific strategies into what types of customers to target and what our reviewers liked or disliked about specific products they supply. Such information is especially valuable to our smaller emerging brands and contributes to how many of them tackle their business strategies.

Our brand partners aren’t the only ones that benefit from this data though. SnackNation is able to leverage these insights in ways that help our customers enjoy more of the snacks they love. With the data studied above and other product specific information such as flavor profiles and key ingredients, we are able to build applications that help us continue to grow our business. For example, we plan to build a recommendation engine that informs customers of what SnackNation offerings best fit their needs and tastes.

As SnackNation continues to grow and perfect its survey data collection process, our team will be able to generate analyses and reports quicker than ever before. This will not only improve SnackNation as a service, but also help consumers snack better and healthier.