“Data Visualization” a beaten up topic with a Twist- Ep1|Raising the Bar
Level: Beginner — Intermediate
Background and Context: This blog series is not about path breaking topics like LLM and Gen AI (will get there but let’s put the stepping stones first). Anyone who has been in the data domain for a considerable amount of time, has used various BI/ Data analysis tools, excel, python, panda, charting libraries for sure to perform data analysis. Even otherwise less experienced (intermediate) or entry level professionals will know about different charting libraries and how to build different visualizations. Remember “one size does not fit all” hence to choose the right package for the right kind of job could still be a trial and error process.
Goal: As a matter of fact there are plenty of blogs on how to build “A” charts with “B” library or “C” without showing the differences and tradeoff between “B” and “C ”. My goal though this series would be to showcase capabilities of different visualization packages, executable code as quick reference and tradeoff/scorecard for the best packages to be used for a specific chart type. I intend to have a couple of episodes for this series so we can keep the information flow digestible focusing on progressive learning.
This is episode 1 of the series. In this episode we will cover bar chart, Stack chart and group bar chart in this episode
Visualization Packages: Few popular and widely used charting libraries with python are matplotlib, seaborn, and plotly. Let’s dive into it.
First Thing First
— Data: We will use Kaggle data for the analysis and research here.
— Environment: Google Colab Notebook. Learn more about google Colab here
Import necessary libraries
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly as pltl
import plotly.express as px
# Mount google drive and data
from google.colab import drive
drive.mount('/content/drive')
dataPath = "/content/drive/My Drive/Blog Codes/Data/Billionaires Statistics Dataset.csv"
Matplot/Pyplot
Chart Type: Bar
# extract counts of billionairs by business categories
df_data = df['category'].value_counts()
#code to plot bar chart
plt.figure(figsize=(30,30))
df_data.plot(kind='bar',figsize=(4,4), color = ['blue'])
plt.xlabel('category')
plt.ylabel('# of Billionairs')
plt.title('Frequency Distribution of Billionairs by Business Category')
plt.show()
Observations:
- Charts can be generated using fairly less number of codes
- Less customization options even changing the chart size is not very straightforward.
- Charts are static, not interactive
- Chart type is supported OOTB no custom code required
Chart Type: Grouped Bar
# Prepare data
df = df[df['country'].isin(['United States','China','India','Taiwan'])]
df_data1 = df[df['category']=='Technology']['country'].value_counts()
df_data1
df_data2 = df[df['category']=='Automotive']['country'].value_counts()
df_data2
# Build Chart
xlst = df.country.unique()
X = np.arange(4)
width = 0.4
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax.bar(X-0.2, df_data1, width, color='y')
ax.bar(X + 0.2, df_data2, width, color='b')
ax.set_xticks(X, xlst)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
ax.legend(labels=['Technology', 'Automotive'])
Observations:
- Comparatively more coding required
- Charts are static, not interactive
- Char type is not supported by native methods and custom coding is required
Chart Type: Stacked Bar
# Prepare data
df = df[df['country'].isin(['United States','China','India','Taiwan'])]
df_data1 = df[df['category']=='Technology']['country'].value_counts()
df_data1
df_data2 = df[df['category']=='Automotive']['country'].value_counts()
df_data2
# Build Chart
xlst = df.country.unique()
X = np.arange(4)
width = 0.4
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax.bar(X, df_data2, width, color='b')
ax.bar(X, df_data1, width, bottom = df_data2, color='y')
ax.set_xticks(X, xlst)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
ax.legend(labels=['Technology', 'Automotive'])
Observations:
- Comparatively more coding required
- Charts are static, not interactive
- Char type is not supported by native methods and custom coding required
Seaborn
Bar chart
plt.figure(figsize=(10,4))
#to plot bar chart
sns.countplot(x='category', data=df, color = 'blue')
# X-Axis ticks rotation
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('# of Billionairs')
plt.title('Billionaire Distribution by Business Category')
plt.show()
Observations:
- Fairly less code required for building the chart.
- Charts are static
- Chart type is supported OOTB
Chart Type: Grouped Bar
# Prepare Data
df = df[df[df['country'].isin(['United States','China','India','Taiwan'])]['category'].isin(['Technology','Automotive'])]
# Build Chart
import seaborn as sns
sns.countplot(x='country', hue='category', data=df)
plt.xticks(rotation=90)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
plt.show()
Observations:
- Fairly less code required for building the chart.
- Charts are static
- Chart type is supported OOTB
Chart Type: Stacked Bar
# Build Chart
import seaborn as sns
sns.displot(x='country', hue='category', data=df,multiple="stack")
plt.xticks(rotation=90)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
plt.show()
Observations:
- Fairly less code required for building the chart.
- Charts are static
- Chart type is supported OOTB
Plotly
Chart Type: Bar
from plotly import express as px
fig = px.histogram(df, y='category', title='Frequency Distribution Billionaires by business category', text_auto=True).update_yaxes(categoryorder='total ascending')
fig.update_layout(width = 900)
fig.show()
Observations
- Charts are interactive
- Very few lines of code to build the chart
- Visually appealing chart
- Multiple native option for quick customization
- Chart type is supported OOTB no custom code required
Chart Type: Grouped Bar
# Build Chart
fig = px.histogram(df, x='country',color="category",barmode='group',title="Billionaire Population by Country and Industry")
fig.update_layout(width = 900)
fig.show()
Observations
- Charts are interactive
- Very few lines of code to build the chart
- Visually appealing chart
- Multiple native option for quick customization
- Chart type is supported OOTB no custom code required
Chart Type: Stacked Bar
# Build Chart
fig = px.histogram(df, x='country',color="category",barmode='stack',title="Billionaire Population by Country and Industry")
fig.update_layout(width = 900)
fig.show()
- Charts are interactive
- Very few lines of code to build the chart
- Visually appealing chart
- Multiple native option for quick customization
- Chart type is supported OOTB no custom code required
Bringing it all together : Comparison Matrix