DATA VISUALIZATION: WHAT? WHEN? HOW? -II

Rashmi Manwani
Analytics Vidhya
Published in
5 min readJul 16, 2021
Data Visualization

Are you thinking to give up on a problem and start looking for new information?

Have you hit a wall in your research?

Do you also spend your valuable minutes thinking of how to make the data plausible?

If yes, then this article is for you. It will lead you to some of the visualization techniques that can help you make sense of data, detect trends, identify patterns and more. So, continuing my previous blog: ‘Data Visualization: What? When? How?’ I will try to make this article “very simple” just like ‘Khaby Lame’.

Khaby Lame

So, let us start with it. In this article, I will focus upon three visualization techniques:

  • Histogram
  • Count Plot
  • Bar Plot

To explain these techniques, I have used the ‘Telecom Customer Churn Prediction’ dataset from Kaggle. The dataset looks like this:

Dataset

1. HISTOGRAM:

Library used: Seaborn

According to the definition of Histogram, they represent the data in a graphical format through the way of bars of different heights. They are used for visualizing the frequency distribution of data variables.

Didn’t understand? Don’t worry the examples will cover it up 😉

Here, I had plotted a histogram for Monthly Charges.

Histogram: Monthly Charges

Observing the above histogram, we can tell that the Monthly Charge lies between (20, 120), and the values on the Y-axis depict the count of customers having Monthly Charges within a specified range.

It describes that nearly 600 customers have their monthly charges in the range of (70, 90). The majority of them have a monthly expenditure of around 20, while a few customers have 120 as their monthly expense.

A univariate histogram is fruitful for observing the shape of the distribution. But we can also add some other parameters to inspect the effect of the distribution on others.

Histogram: Tenure w.r.t Churn

Here, the class variable is tenure. Adding hue=Churn, we can inspect the effect of the class variable on Churn. As you can see, the Churn rate (plot in orange colour) is inversely proportional to tenure.

Now, what if we want to plot a bivariate histogram?

For that, you need to add the y parameter as your second variable.

Bivariate Histogram: Species Vs Sepal Width

Here, I had taken the Iris dataset of Seaborn library, with x as Sepal Width and y as Species. We have got a bivariate histogram in the form of a heatmap. You can see that the Sepal Width in the range of (2.5, 3.9) belongs to the species of Virginica, and similarly, we can specify sepal width for other species.

Okay, okay, too much information. Now, it’s time to get the gist of the Histogram:

  • Histograms are used for observing the frequency distribution of numeric data variables.
  • We can inspect the effect of the frequency distribution on other variables with the help of the hue parameter.
  • Using both x and y parameters, we can plot a bivariate histogram.

2. COUNTPLOT:

Library Used: Seaborn

Count plot is just another version of a histogram, used for categorical values rather than quantitative ones. It plots the data using the format of bars, with the Y-axis representing the count of records in different categories.

Here, I had taken a count plot for Payment Method:

Count Plot: Payment Method

The X-axis represents the different categories of Payment Method, and the Y-axis has the count of observations for each class. We can infer that the maximum number of customers have Electronic Check as their Payment Method.

Count Plot also has the provision of adding another variable using the hue parameter.

Count Plot: Payment Method w.r.t Churn

From the above plot, we can witness the categorization of Payment Method w.r.t to Churn variable.

A quick arousing doubt, What is the difference between Count Plot and Histogram?

  • A count plot is used for categorical values, whereas histograms are for numerical data
  • Count plot works great for discrete data variables and histograms for continuous ranges

There you go towards the final plotting technique! 😀

3. BARPLOT:

Library Used: Seaborn

A bar plot is used for visualizing the relationship between a categorical and a numeric data variable. It represents the categorical data in the form of an aggregate, by default with the mean value of the numeric data variable.

Barplot: Tenure Vs Payment Method

The above bar plot depicts the Payment Method w.r.t the average of tenure. We can deduce that the customers with Payment Method of Bank Transfer have an average tenure of 45 years, whereas Electronic Check customers have 25 years as their average tenure.

Now, let’s try to add hue parameter to the bar plot:

Barplot: Tenure Vs Payment Method w.r.t Churn

Each category on X-axis has split according to the hue variable. For customers with the payment method of Mailed Check, the Churn rate is low for an average tenure of 10 years.

In the essence of a Bar Plot:

  • Bar plot is used for effective visualization when one axis has a categorical and another has a numeric value.
  • By default, Bar Plot represents the categorical data w.r.t mean value of the numeric data variable.

Hurray!! You have successfully gotten the hang of Visualization tips and tricks! 👏

Thank you!

I’d be obliged to receive any comments, suggestions or feedback 😃

Stay tuned for upcoming Visualization Techniques!

You can find the link to the Github code here.

Connect on LinkedIn: https://www.linkedin.com/in/rashmi-manwani-a13157184/

Connect on Github: https://github.com/Rashmiii-00

--

--

Rashmi Manwani
Analytics Vidhya

“Learning to write is learning to think. You don’t know anything clearly unless you can state it in writing.” ~ S. I. Hayakawa