How to Create Percentage Plots for Visualizing Your Data in Python

With Examples Using Seaborn and Plotly Express.

Vivienne DiFrancesco
The Startup

--

Photo by Benedikt Geyer on Unsplash

When I first started with data science I was amazed at all the beautiful plots that could be made so easily with packages like Seaborn or Plotly Express. But there came a point where I was working on a project and realized the perfect EDA plot would show the percentage of entries in my data that were in the different target classes split out by a categorical feature. Some scouring through documentation, galleries, and Stack Overflow pages and I realized that there was no canned plot to be able to do what I wanted. In this article, I’m going to show you how to very easily be able to create a percentage plot in Python. You should be able to adapt this principle to any plotting package of your choice but I will show examples in Seaborn and Plotly Express.

First, you need to create a new dataframe with the percentages for the feature you are interested in plotting. I’ll show the code first and then explain:

new_df = og_df.groupby(feature)[target].value_counts(normalize=True)
new_df = new_df.mul(100).rename('Percent').reset_index()

Use the groupby() method paired with normalized value_counts() on your original dataframe with the feature you want to plot and your target. Save this…

--

--