How to use several types of Plots for data visualization using a Categorical plot in the Seaborn library?

Kanishk Barhanpurkar
Analytics Vidhya
Published in
5 min readJun 18, 2020
Photo by Kevin Ku from Pexels

In ancient times, people used to draw cave-paintings, rock-art paintings to mark something important which now used by Archaeologists and scientists to understand and analyze the art, culture, economics, and other vital parameters of that period. However, in the context of data size, which is very less as compared to present times. Now the data is in the form of hundreds of Petabytes and challenging to understand — many techniques used for extraction of essential data from the dataset, which contains a tremendous amount of data. Data Visualisation is one of those techniques which helps to understand the data in the form of graphical representation.

The greatest value of the picture is when it forces us to notice what we never expected to see.

Quote by John W. Tukey

Python is one of the used languages for data visualization, and it takes place by two libraries Matplotlib and Seaborn. The combination of these libraries plays a significant role in Data-science domain.

If the Matplotlib makes grounds for visualization, then Seaborn beautify and nourishes the process of data visualization.

Seaborn contains a category of graphs and plots which can be used for different purposes. In this blog, the Test Cricket dataset which includes the record of Batting statistics for all the players who played test cricket till 15th June 2020. Catplot stands for the Categorical plot. Catplot is a recent addition to Seaborn that makes plotting easy, which involves categorical variables. In Seaborn version v0.9.0 that came out in July 2018, changed the older factor plot to catplot to add more functionality.

(Step-1) Importing required libraries- The libraries required apart from the seaborn are NumPy, Pandas, and Matplotlib and Time. Just for curiosity, the time constraint is added for every plot.

(Step-2) Using pandas to use the dataset.- By making data-frame, we will access the cricket dataset.

Data-attributes.

(Step-3) Description of Data-set- The following attributes of the dataset are as follows-

  1. Player Name
  2. Career Span (how many years each player played international cricket)
  3. Matches (number of matched played)
  4. Innings (number of innings played)
  5. Not out (number of times player is not out)
  6. Runs (run scored throughout the career)
  7. Highscore
  8. Average
  9. Century
  10. Half-century ((Number of half-centuries scored in entire career)
  11. Zeroes (Number of ducks scored in entire career)
  12. Profile (player-profile link)

(Step-4) Dropping of unrequired attributes using drop() method and determining data-types.

(Step-5) Now, we are going for an essential part of data visualization in the form of Catplot. Moreover, we are good to go. We also measure different parameters of time required for execution. Here we start with default stripplot which will work as cat-plot.

Stripplot (default).

Next, we will go to Swarmplot. A swarm plot is the right choice when you want to show all observations along with some representation of the underlying distribution. On comparing the total time of execution, stripplot is 14 times faster than swarm plot.

Swarmplot.

Categorical distribution plot- Distribution plot used to distribute the dataset efficiently. Catplot can used to draw three distribution plot- violin plot() and boxenplot(). A violin plot is a method of plotting numeric data. It contains a rotated kernel density plot on each side. Violin plots show the probability density of the data at different values by a kernel density estimator.

Violin-plot.

Boxenplot- This plot was named after a “letter value” plot because it shows a large number of quantiles that are defined as “letter values”. It is a nonparametric representation of a distribution in which all features are according to actual observations. The approach for boxen-plot si simple on plotting more quantiles, it will provide more information about the delivery.

Boxenplot.

Categorical Estimate plot- Estimation plots are widely used in the data analysis framework that uses a combination of effect sizes, precision planning, confidence intervals, and meta-analysis to plan experiment. Working on a project on the textile industry helps to analyze the data as it contains unlabeled data.

Point plot- A point plot contains the tendency for a numeric variable by the position of scatter plot points and around that estimate using the error bars.

Barplot- A bar chart is a chart that represents categorical data with rectangular bars with heights or lengths directly proportional to the values that they represent. It can be published horizontally or vertically.

Barplot.

Countplot- Each count-plot counts the number of occurrences of a variable in the given attribute of the dataset. Nevertheless, categorical count-plot counts each time variable appear present in it. It can also be used in classification and clustering because it will provide maximum occurrences of the favourable outcome.

Conclusion- Thus, cat-plot will be used to draw many different plots in the Seaborn library. Its usage will ultimately depend on the user and type of dataset on which he is working. If you are beginner, I say try every plot on different datasets as every plot shows a different story and provides several cases. It will increase your understanding, and in the end, you will be Legend.

If you make yourself more than just a man, If you devote yourself to an ideal then you become something else entirely, it’s Legend.

Quote from one of my favourite movie- Batman (The Beginning).

For the entire code, please follow this link.

Let’s connect on LinkedIn.

--

--