Consumer Behavior and Shopping Habits

Consumer Behavior and Shopping Habits Data Analysis

Vaishnavi Rajput
5 min readNov 9, 2023

About Dataset

This Dataset is from Kaggle . The link for the data set is as follows :

The Consumer Behavior and Shopping Habits Dataset offers a deep dive into the intricate world of consumer choices and shopping behaviors. It covers a wide array of variables, including demographics, purchase history, product preferences, shopping frequency, and online/offline shopping habits. This robust dataset is a goldmine for analysts and researchers, allowing them to unravel the complexities of how consumers make decisions. This, in turn, empowers businesses to create laser-focused marketing strategies, fine-tune their product offerings, and elevate the overall customer experience.

In this article, I’ll be working on the Consumer Behavior and Shopping Habits Dataset. We’ll first explore the dataset and its columns using pandas functions. Then we will apply cleaning techniques to handle null values.

I hope you find this article practical and useful in your learning, or your work.

Now let’s dive in!

Importing Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Load Data

df=pd.read_csv('shopping_behavior_updated.csv')
df

Dataset Glossary (Column-wise)

Customer ID:A unique identifier assigned to each individual customer, facilitating tracking and analysis of their shopping behavior over time.

Age: The age of the customer, providing demographic information for segmentation and targeted marketing strategies.

Gender: The gender identification of the customer, a key demographic variable influencing product preferences and purchasing patterns.

Item Purchased: The specific product or item selected by the customer during the transaction.

Category: The broad classification or group to which the purchased item belongs (e.g., clothing, electronics, groceries).

Purchase Amount (USD): The monetary value of the transaction, denoted in United States Dollars (USD), indicates the cost of the purchased item(s).

Location: The geographical location where the purchase was made, offering insights into regional preferences and market trends.

Size: The size specification (if applicable) of the purchased item, relevant for apparel, footwear, and certain consumer goods.

Color: The color variant or choice associated with the purchased item, influencing customer preferences and product availability.

Season: The seasonal relevance of the purchased item (e.g., spring, summer, fall, winter), impacting inventory management and marketing strategies.

Review Rating: A numerical or qualitative assessment provided by the customer regarding their satisfaction with the purchased item.

Subscription Status: Indicates whether the customer has opted for a subscription service, offering insights into their level of loyalty and potential for recurring revenue.

Shipping Type: Specifies the method used to deliver the purchased item (e.g., standard shipping, express delivery), influencing delivery times and costs.

Discount Applied: Indicates if any promotional discounts were applied to the purchase, shedding light on price sensitivity and promotion effectiveness.

Promo Code Used: Notes whether a promotional code or coupon was utilized during the transaction, aiding in the evaluation of marketing campaign success.

Previous Purchases: Provides information on the number or frequency of prior purchases made by the customer, contributing to customer segmentation and retention strategies.

Payment Method: Specifies the mode of payment employed by the customer (e.g., credit card, cash), offering insights into preferred payment options.

Frequency of Purchases: Indicates how often the customer engages in purchasing activities, a critical metric for assessing customer loyalty and lifetime value.

Exploratory Data Analysis

head gives us the first few rows of our dataset.

df.head()

dtypesreturn the dtypes in the DataFrame.

df.dtypes

columnsreturn the names of all columns in our dataset

df.columns

info() will tells us tons of information about our data frame like the shape (rows, columns) the data type of our features, and the memory usage.

df.info()

Now we have some idea of our dataset, we can move on to dig in columns and the data it contains.

Explore the columns

Clothing Category

df['Category'].value_counts()

value_countsreturn a Series containing counts of unique values. The result will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Which payment methods that are used for online shopping?

df['Payment Method'].value_counts()

What shipping type people prefer for their order?

df['Shipping Type'].value_counts()

Dealing with Missing Values

df.isnull().sum()

isnull().sum()will tell us about what is number of null values in each column. So as we can see we don’t have any null values.

Checking unique values in the each column

df.nunique()
So we have these much unique value in each columns

Data Visualization

Before we visualize we will create one more columns to divide our Age column into category of age.

df.loc[df['Age']<=19, 'age_group'] = 'teenage'
df.loc[df['Age'].between(20,30), 'age_group'] = 'yadult'
df.loc[df['Age'].between(31,59), 'age_group'] = 'adult'
df.loc[df['Age']>=60, 'age_group'] = 'older_adult'

we have categorized the age in categories like Teenager,Yadult,adult,older adult
so lets see on that basis which gender category is more likely to do shopping

df['age_group'].value_counts().plot(kind='pie', autopct = '%1.1f%%',figsize=(10,8))

plt.show()
Adults are most likely to shope more

Lets see which gender is most likely to shope more according to our dataset.

colors=('red','blue')

plt.title("Male and Female Shopping in %",fontdict = {'fontweight':'bold','fontsize':18})

#creating a pie chart

df['Gender'].value_counts().plot(kind='pie', autopct = '%1.1f%%',figsize=(10,8),colors=colors)
plt.show()
surprisingly mens are most likely to shop

Lets see which product category sold most

plt.figure(figsize=(5,5))
colors =['tab:green','tab:blue']
sns.countplot(data=df,x = 'Category',palette=colors)
plt.title(" Most sold Category")
plt.show()
Clothing category is most sold.

Lets see the imapct of size on purchase

plt.figure(figsize=(6, 4))
sns.swarmplot(x='Size', y='Purchase Amount (USD)', data=df, palette='Set2')
plt.title("Impact of Size on Purchase")
plt.xlabel('Size')
plt.ylabel('Purchase Amount (USD)')
plt.xticks(rotation=45)
plt.show()
XL size has less selling than ususal.

Lets now see the distribution of payment method type

df['Payment Method'].value_counts().plot(kind='pie', autopct = '%1.1f%%',figsize=(10,8))

plt.show()

Now lets visualize Percentage of how often people do shopping

df['Frequency of Purchases'].value_counts().plot(kind='pie', autopct = '%1.1f%%',figsize=(10,8))
plt.title("Percentage of how often people do shopping")
plt.show()

Lets see the subscription status of customers

sns.countplot(x=df['Subscription Status'])

This are some insights I added in this artical. That’s all for this article, thank you for reading and I hope you learned something new from it!

--

--