Geek Culture
Published in

Geek Culture

The perfect internship for people who want to work as Data Analyst- Analyzing Quantium Analytics Data to discover chips brands

[Want to see more Data Science interview preparation? Check out my GitHub profile]

Photo by Jeff Siepman on Unsplash
· The Data
· Data Exploration
· Customer Segments Analysis
· Store Layout Analysis
· Recommendation
· Thank you for reading!

Marketing is part of companies business that the company can not overlook to introduce the company products. It is an extremely important part of the company's journey. Before doing story marketing, there are a few processes that we as part of the company to do in order to share it. In this article, I would like to share my experiences about taking one of the virtual internships by one of the AI companies called Quantium Analytics. The process of analyzing one of Quantium analytics category problems in the chips category. We will start to solve the problems by collecting the data that is supporting the problems we are going to solve, doing data cleaning, feature engineering, and exploratory data analysis through graphs or charts. In this section, we are facing problems with one of Quantium analytics clients in the Chips category through collaboration with a large supermarket by analyzing transactional and customer data. The problem of the supermarket is to change the store layout, product selections, prices, and promotions to satisfy their customer's needs and preferences. The supermarket wants to know more about what recommendations that Quantium analytics can offer based on these parameters.

The Data

We will start analyzing the data by collecting 2 data in CSV and xlsx that are provided by the clients. We will implement a few libraries such as pandas, seaborn, NumPy, and matplotlib to import and extract the information from these two data. I will give a snapshot of the code from the Jupyter notebook to capture part of the code. You can see there are more than 260k rows that we will analyze that consist of transactional and customer data.

Data Exploration

The second process that we will do is by extracting and cleaning the data that consist of missing values, duplicated data, inappropriate datatype of the features/columns, or even outliers that are likely to influence our analysis. You can check the code for further information.

After cleaning the data, the next step we will do is by looking at a few features that could be combined with one another or even creating new features by combining one or more than 1 feature to create new features that could improve our recommendation. Domain expert plays an essential in feature engineering. The better you know the data, the better feature you can create. You can see the code for further information.

Doing these processes will be a good steps for making our data is ready to be extracted and getting a few insights. We will analyze the data based on a few business metrics in order to give some insights by looking at total sales, drivers of sales, where the highest sales are coming from.

Total Sales from July 2018 to June 2019

we can see that the increase in sales during the range from 2018 to 2019 unless the dramatic drop on December 25, 2018, which is a public holiday/ Christmas. The graph below shows a particular day in December where December 25 has 0 sales.

Total Sales of December 2018

Customer Segments Analysis


We start by finding most customers are coming from older singles/couples followed by retirees and older families as shown on the graph. We will see whether the number of customers based on the life stage will influence the total sales each month through business metrics that we have decided on before.

You can see that the Older Families budget, Young singles/couples mainstream, and retirees mainstream has significant total sales. The number of customers in these segments has significantly influenced the way these segments spend more by buying chips. However, New Families spend less compared to other customer segments on each member type.

On the number of customers side, there are more Young singles/couples mainstream, Retirees — mainstream, and Midage singles/couples mainstream who buy chips on the graph.

We also investigate the average unit price per transaction by customer life stage and member type and found that Young singles/couples and midage singles/couples are more willing to buy more chips compared to other types of customers such as budget and premium. It is likely that chips is not their choice of snack. We can check the difference between midage singles/couples and young singles/couples using a t-test to check whether the difference is statistically significant. We can check the significance by using scipy library by comparing the mean of these 2 factors whether they are different or not. Again you can check the variable on the jupyter notebook for further information. We divide 2 groups and check the difference value by checking the p-value to verify the significance of the difference, we get 6.967354233018139e-306 which is significant enough to check that the unit price of mainstream young singles/couples is statistically significant compared to the non-mainstream type of midage member type.

We can also check the brand name of chips that we have extracted from the cleaning data process that young singles/couples tend to buy Kettle, Doritos, and Pringles brands for the brand of choice. The kettle is the top brand with 19,7% followed by Doritos with 12.2% and Pringles with 11.8% as depicted on the chart below.

Mainstream Young Singles/Couples

On the other hand, Midage singles/couples put Kettle as the top brand of choice by 19.3% followed by Smiths and Doritos each by 11.5% and 10.9%.

Midage Singles/Couples

The Pack Size category that we have analyzed depicted that 150g of Pack Size has been the second top of choice for customers after 175g with Kettle and Pringles is the most favorite brand of chips.

Brand Chips and Pack SIze

Store Layout Analysis

Total Sales of Trial Store over time
Number of Customers of Trial Store over time

We can see from both of the business metrics of total sales and number of customers on Trial store shows a significant relation with control store rather than the average of other stores. This indicates that the decision to change the layout will make an impact on the increase of total sales and number of customers. We can also check the significance of these metrics by verifying the values of 5% and 95% of the Confidence interval to check whether these two business metrics have been a good idea to validate.

Total Sales of Store Sales Comparison
Number of Customers of Store Sales Comparison

you can see that during the trial period from February to April 2019 depicted that the total number of customers and total sales lies outside of the 5% and 95% confidence interval that means that the trial store has outperformed the control stored during that period. It brings a significant change when implementing this trial store as depicted on the chart both of the number of customers and total sales metrics.


The analysis will be impactful when there will be some real actions after getting insights through extracting the data. on this Chips category problems, we can see that mainstream young singles/couples and mainstream retirees have significantly increased of total sales due to the fact that many customers of these segments. This indicator can be an advantage for supermarkets to do more promotions for these segments to increase the total sales and the number of customers in these segments. We can also see that there will be a good idea to collaborate 1 or 2 brand chips that affect the total sales like Doritos and Kettle that have been chosen by mainstream Young singles/couples to do some branding promotions. We can also see that the impact of the trial store on the store has outperformed the control store that the implementations of the trial store is a good idea to implement based on the parameters that we have analyzed through verifying the business metrics. You can check the repository for this project by going to this repository for detailed code and documentation.

Thank you for reading!

I really appreciate it! 🤗 If you liked the post and would like to see more, consider following me. I post topics related to machine learning and deep learning. I try to keep my posts simple but precise, always providing visualization, and simulations.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store