Data Visualization 103- Python for Data Analysis

Kaan ÇUKUR
4 min readDec 11, 2022

How can we understand our data with countplot?

Data scientists and analysts spend too much time on data visualization. Everybody has the same question: “Which graph do we need to use for analysis ?”

When I was working on data analysis I notice that with countplot graphs we can understand many things from data. Of course, we need to do much more things but for data understanding, countplot is best for me at this time.

In this article, we will examine a dataset containing the intentions of users visiting an online shopping site. Let’s begin.

Data Set

We will import our data set with Pandas and you can find data set on Kaggle. I will write codes in Google Colab but you can write any where you want, just file location will change.

from google.colab import files
uploaded = files.upload()
df=pd.read_csv("online_shoppers_intention.csv")
df

Definition of Data Set

  • Administrative: This is the number of pages of this type (administrative) that the user visited.
  • Administrative_Duration: This is the amount of time spent in this category of pages.
  • Informational: This is the number of pages of this type (informational) that the user visited.
  • Informational_Duration: This is the amount of time spent in this category of pages.
  • ProductRelated: This is the number of pages of this type (product related) that the user visited.
  • ProductRelated_Duration: This is the amount of time spent in this category of pages.
  • BounceRates: The percentage of visitors who enter the website through that page and exit without triggering any additional tasks.
  • ExitRates: The percentage of pageviews on the website that ends at that specific page.
  • PageValues: The average value of the page averaged over the value of the target page and/or the completion of an eCommerce
  • SpecialDay: This value represents the closeness of the browsing date to special days or holidays (eg Mother’s Day or Valentine’s day)
  • The dataset also includes the operating system, browser, region, traffic type, visitor type as returning or new visitors, weekend, and month of the year.

EDA (Exploratory Data analysis)

We don’t have null values and i am passing statistic part, I will directly focus graphs. Let’s have a look our unique values.

df.nunique()

We can see some of basics.

  • Revenue and Weekend columns are bool type.
  • There is 3 different visitor type.
  • 10 different months so our data set don’t include all year.
  • 13 different browsers. Is it not too much :)
  • 6 different special days.

Let’s start with the analysis of months. For all analyses I will use the same code block, just the feature name will change.

plt.figure(figsize = (20,10))
plt.subplot(1,2,1)
plt.title('visit counts of Month')
sns.countplot(df['Month'])
plt.subplot(1,2,2)
plt.title('Month vs Revenue')
sns.countplot(x= 'Month', hue = 'Revenue', data = df)
plt.show()

As you can see , there is a lot visitor count in March, May, November, and December. But if you look at May, there is not too much revenue. This company need to focus on May :)

Now I am thinking is there any special days in May ? Let’s check.

plt.figure(figsize=(15,7))
plt.title("Month vs special days")
sns.countplot(x="Month", hue="SpecialDay", data=df)
plt.show()

Yepp, the visitor count is highest in May as you know. It may be because all the special days are in May. Also in November, the visitor count is the highest cause of the special day 0. Maybe the company can think do a few offers in November :) Is visitor type can be effective at this point?

plt.figure(figsize=(15,7))
plt.title("Month vs VisitorType")
sns.countplot(x="Month", hue="VisitorType", data=df)
plt.show()

I guess we catch a good point. Returning and new visitors count is increasing on special days.Now I am curious about what is the region of new visitors and is it increasing ?

plt.figure(figsize=(15,7))
plt.title("Region vs VisitorType")
sns.countplot(x="Region", hue="VisitorType", data=df)
plt.show()

Hmm…Region 1 has the highest visitor count. It’s okay. But the company need to focus on region 2,3 and 4. New and returning visitors from these regions.

We can contuniue this samples. But main idea is understand all data with just countplot graps. Of course we need to look correlation, statistic and revenue rate etc. Also this data set is very useful for machine learning model.

Last words

Thanks for reading this blog. Your comments and likes will help my growth. If you want to see more content like this, you can follow my medium profile.

You can find all the source code in my GitHub profile. You can keep in touch with me from my LinkedIn profile.

If I have any mistake, please feel free to comment.

--

--