Forage Data Analyst Virtual Internship: Task 1

Talib Izhar
5 min readJan 26, 2024

--

Sharing my experience of Quantium Virtual Data Analyst Internship which consists of three tasks.

Introduction

Task 1: Data Exploration and Customer Analytics.

In this blog post, we will embark on a data analysis journey to gain insights into customer purchase behavior. We will be utilizing the R programming language and several powerful packages to extract meaningful information from our transaction and customer behavior datasets. Our goal is to gain insights into customer segments, their spending patterns, preferred brands, and pack sizes.

Data Exploration

We start by importing the necessary R packages and loading the transaction and customer behavior datasets. Using functions like head() and str(), we explore the structure and contents of the datasets to understand their dimensions, data types, and variables.

#Exploring Transactions Data
str(trns_1)
trns_1 %>%
summarize_all(class) %>%
gather(variable, class)

Data Transformation

Next, we perform data transformation tasks such as formatting the date column, handling outliers, cleaning and preprocessing text data in the product name column, and checking for missing dates. These transformations ensure that the data is in a suitable format for analysis.

  • I found out the data type of the DATE column was an integer instead of a Date data type.

Exploratory Data Analysis (EDA)

During the EDA phase, we delve deeper into the datasets to uncover insights. We analyze summary statistics, identify outliers, explore product names and pack sizes, check unique values in categorical columns, visualize transaction trends over time, and check for duplicates in the data. I found outliers in the Quantity and Sales column using the box whisker plot and removed them later.

Later I removed all the non-chips product categories from the product name. While checking the date column, I found out 1 date was missing, and after further analysis, it turns out to be 25th Dec, I’m showing the plot specifying the sudden increase in sales and drop in the number of transactions.

One duplicate value was found and removed from the transaction dataset.

Distribution of Pack Size

The pack size frequency doesn’t seem inconsistent and does not differ significantly from other observations.

Data Analysis

In the data analysis section, we focus on calculating metrics related to customer segments, such as total sales, chips bought per customer, number of customers in each segment, and average sales by customer segment. We use visualizations like bar plots and histograms to compare sales across different customer segments and analyze the chips bought per customer.

I created a column for brand names and there are a total of 26 unique brands.

I printed out the unique values in customer behaviors’ categorical columns. After cleaning and exploring the datasets, I merged customer behavior and transaction tables to make it easy for analysis.

For analysis, four metrics created these are as follows:

  1. Customer segment who are spending most.

Mainstream customers are spending most in premium_customer and OLDER SINGLES/COUPLES in LIFESTAGE segment.

Most sales come from Budget-Midage singles/couples, followed by Mainstream-young singles/couples.

2. Chips bought per customer by segment.

Mainstream-Older families segment are buying more chips, followed by mainstream young families.

3. Number of customers in each segment.

The highest number of customers is in the Mainstream-Young Single/Couples segment which is the reason for more sales in this segment. But this is not the case for the Budget-midage segment.

4. Avg sales by customer segment.

Mainstream-Young single couples & Middle-aged single/couples tend to spend more per unit and contribute most in sales.

In further analysis, find out the brand they prefer and the size of the packet. Below are the findings:

Mainstream young single couple segment tends to buy TYRRELLS chips most and BURGER the least. They prefer to buy 270g pack size most and 220g the least. Twisties Cheese is the brand that sells 270g size chips.

Insights and Recommendations

  • Based on our analysis, we uncover key insights such as the segments with the highest sales, preferred brands and pack sizes among specific customer segments, and trends in spending per pack. Just before Christmas sales increased significantly. These insights provide valuable information for business decision-making.
  • Category Manager can focus more on TYRRELLS chips as Mainstream-young single/couples tend to buy these chips by increasing the visibility of the product to attract customers of this segment.
  • Maintaining the stock sufficient for sales just before Christmas.

Stay tuned for the next part of our data analysis journey in the upcoming blog post!

Check out the detailed analysis on Github. Any thoughts or suggestions are welcome in the comment or you can directly message and connect with me on Linkedin.

--

--