“Data Visualization” a beaten up topic with a Twist- Ep1|Raising the Bar

Mainak Mitra
5 min readOct 29, 2023

--

Photo by Anne Nygård on Unsplash

Level: Beginner — Intermediate

Background and Context: This blog series is not about path breaking topics like LLM and Gen AI (will get there but let’s put the stepping stones first). Anyone who has been in the data domain for a considerable amount of time, has used various BI/ Data analysis tools, excel, python, panda, charting libraries for sure to perform data analysis. Even otherwise less experienced (intermediate) or entry level professionals will know about different charting libraries and how to build different visualizations. Remember “one size does not fit all” hence to choose the right package for the right kind of job could still be a trial and error process.

Goal: As a matter of fact there are plenty of blogs on how to build “A” charts with “B” library or “C” without showing the differences and tradeoff between “B” and “C ”. My goal though this series would be to showcase capabilities of different visualization packages, executable code as quick reference and tradeoff/scorecard for the best packages to be used for a specific chart type. I intend to have a couple of episodes for this series so we can keep the information flow digestible focusing on progressive learning.

This is episode 1 of the series. In this episode we will cover bar chart, Stack chart and group bar chart in this episode

Visualization Packages: Few popular and widely used charting libraries with python are matplotlib, seaborn, and plotly. Let’s dive into it.

First Thing First

— Data: We will use Kaggle data for the analysis and research here.

— Environment: Google Colab Notebook. Learn more about google Colab here

Import necessary libraries

# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly as pltl
import plotly.express as px

# Mount google drive and data
from google.colab import drive
drive.mount('/content/drive')
dataPath = "/content/drive/My Drive/Blog Codes/Data/Billionaires Statistics Dataset.csv"

Matplot/Pyplot

Chart Type: Bar


# extract counts of billionairs by business categories
df_data = df['category'].value_counts()

#code to plot bar chart
plt.figure(figsize=(30,30))

df_data.plot(kind='bar',figsize=(4,4), color = ['blue'])
plt.xlabel('category')
plt.ylabel('# of Billionairs')
plt.title('Frequency Distribution of Billionairs by Business Category')


plt.show()

Observations:

  • Charts can be generated using fairly less number of codes
  • Less customization options even changing the chart size is not very straightforward.
  • Charts are static, not interactive
  • Chart type is supported OOTB no custom code required

Chart Type: Grouped Bar

# Prepare data
df = df[df['country'].isin(['United States','China','India','Taiwan'])]
df_data1 = df[df['category']=='Technology']['country'].value_counts()
df_data1
df_data2 = df[df['category']=='Automotive']['country'].value_counts()
df_data2

# Build Chart
xlst = df.country.unique()
X = np.arange(4)
width = 0.4
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax.bar(X-0.2, df_data1, width, color='y')
ax.bar(X + 0.2, df_data2, width, color='b')
ax.set_xticks(X, xlst)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
ax.legend(labels=['Technology', 'Automotive'])

Observations:

  • Comparatively more coding required
  • Charts are static, not interactive
  • Char type is not supported by native methods and custom coding is required

Chart Type: Stacked Bar

# Prepare data
df = df[df['country'].isin(['United States','China','India','Taiwan'])]
df_data1 = df[df['category']=='Technology']['country'].value_counts()
df_data1
df_data2 = df[df['category']=='Automotive']['country'].value_counts()
df_data2

# Build Chart
xlst = df.country.unique()
X = np.arange(4)
width = 0.4
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax.bar(X, df_data2, width, color='b')
ax.bar(X, df_data1, width, bottom = df_data2, color='y')
ax.set_xticks(X, xlst)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
ax.legend(labels=['Technology', 'Automotive'])

Observations:

  • Comparatively more coding required
  • Charts are static, not interactive
  • Char type is not supported by native methods and custom coding required

Seaborn

Bar chart

plt.figure(figsize=(10,4))

#to plot bar chart
sns.countplot(x='category', data=df, color = 'blue')
# X-Axis ticks rotation
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('# of Billionairs')
plt.title('Billionaire Distribution by Business Category')
plt.show()

Observations:

  • Fairly less code required for building the chart.
  • Charts are static
  • Chart type is supported OOTB

Chart Type: Grouped Bar

# Prepare Data
df = df[df[df['country'].isin(['United States','China','India','Taiwan'])]['category'].isin(['Technology','Automotive'])]


# Build Chart
import seaborn as sns
sns.countplot(x='country', hue='category', data=df)
plt.xticks(rotation=90)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
plt.show()

Observations:

  • Fairly less code required for building the chart.
  • Charts are static
  • Chart type is supported OOTB

Chart Type: Stacked Bar

# Build Chart
import seaborn as sns
sns.displot(x='country', hue='category', data=df,multiple="stack")
plt.xticks(rotation=90)
plt.xlabel("Country")
plt.ylabel("# of Billionaires")
plt.title('Billionaire Population by Country and Industry')
plt.show()

Observations:

  • Fairly less code required for building the chart.
  • Charts are static
  • Chart type is supported OOTB

Plotly

Chart Type: Bar

from plotly import express as px
fig = px.histogram(df, y='category', title='Frequency Distribution Billionaires by business category', text_auto=True).update_yaxes(categoryorder='total ascending')
fig.update_layout(width = 900)
fig.show()

Observations

  • Charts are interactive
  • Very few lines of code to build the chart
  • Visually appealing chart
  • Multiple native option for quick customization
  • Chart type is supported OOTB no custom code required

Chart Type: Grouped Bar

# Build Chart
fig = px.histogram(df, x='country',color="category",barmode='group',title="Billionaire Population by Country and Industry")
fig.update_layout(width = 900)
fig.show()

Observations

  • Charts are interactive
  • Very few lines of code to build the chart
  • Visually appealing chart
  • Multiple native option for quick customization
  • Chart type is supported OOTB no custom code required

Chart Type: Stacked Bar

# Build Chart
fig = px.histogram(df, x='country',color="category",barmode='stack',title="Billionaire Population by Country and Industry")
fig.update_layout(width = 900)
fig.show()
  • Charts are interactive
  • Very few lines of code to build the chart
  • Visually appealing chart
  • Multiple native option for quick customization
  • Chart type is supported OOTB no custom code required

Bringing it all together : Comparison Matrix

--

--

Mainak Mitra

Technical leader| AI, Analytics, BI, Data Engineering (Ex Google, Deloitte, Cisco, IBM, Multiple Startups) MIT, Berkley, Stanford, PMP, CSPO certified