“Data Visualization” a beaten up topic with a Twist- Ep2|Combo Challenge

Mainak Mitra
4 min readNov 1, 2023

--

Photo by Roberto Sorin on Unsplash

Level: Beginner — Intermediate

Background and Context: This is episode two of the series. We will focus on Combo Charts and how different charting packages stand the test. In this episode, I will avoid the code sections for generic package imports. For more background and context please refer to Episode 1 below.

Matplot Lib/Pyplot

Line Bar Combo

Code

df_data1.plot(kind='bar', color='b', alpha=0.5, label='Technology Billionairs')
df_data2.plot(kind='line', color='r', marker='o', label='Automotive Billionairs')
plt.xticks(rotation=90)
plt.legend(loc="upper left", ncol=2)
plt.xlabel("Country")
plt.ylabel("# of Billionairs")
plt.title('Line Bar Graph of Income distribution of Billionairs by Industry')
plt.show()

Observations

  • Code Complexity low, easy to implement, not much code required.
  • Static Chart
  • Not supported in OOTB, there is OOTB method to plot the combo chart
  • Some customization options available

100% Stacked Charts

Code

# Prepare Data
cross_df = pd.crosstab(index=df['country'],
columns=df['category'],
normalize="index")

# Build Chart
ax=cross_df.plot(kind='bar',
stacked=True,
colormap='tab10',
figsize=(8, 6))

plt.legend(loc="lower left", ncol=2)
plt.xlabel("Country")
plt.ylabel("# of Billionairs")
plt.title('100% Stacked Chart for Distribution of billionairs by Industry and Country')

#Code to show percentage of each stacked sub-bar
for n, x in enumerate([*cross_df.index.values]):
for (proportion, y_loc) in zip(cross_df.loc[x],
cross_df.loc[x].cumsum()):

plt.text(x=n - 0.17,
y=y_loc,
s=f'{np.round(proportion * 100, 1)}%',
color="black",
fontsize=7,
fontweight="bold")
plt.show()

Observations

  1. Data preparation is primary to have a 100% stacked output
  2. Complex coding required to show the data values
  3. Not highly customizable
  4. Not supported in OOTB chart methods
  5. Static Chart

Seaborn

Line Bar Combo

Code

#Prepare Data
df = df[df['country'].isin(['United States','China','India','Taiwan'])]
df_data1 = df[df['category']=='Technology']['country'].value_counts()
df_data2 = df[df['category']=='Automotive']['country'].value_counts()
df_data1=df_data1.reset_index()
df_data1=df_data1.rename(columns = {"index":"Country","country":"NoofBlns"})
df_data2=df_data2.reset_index()
df_data2=df_data2.rename(columns = {"index":"Country","country":"NoofBlns"})


#Build Chart
line1 = sns.lineplot(df_data1.sort_values(by='Country'), x = 'Country', y ='NoofBlns', marker='s',color = 'b')
bar1=sns.barplot(df_data2.sort_values(by='Country'), x = 'Country', y = 'NoofBlns', color = "y" )


# Add Legends
line = mpatches.Patch(color='b', label='Technology')
bar = mpatches.Patch(color='yellow', label='Automotive')
plt.legend(handles=[line, bar])

plt.show()

Observations

  • OOTB Support does not exist
  • Data Preparation plays a major role
  • Quite a bit of Custom coding necessary
  • Static charts

100% Stacked Charts

Code

# Prepare Data
cross_df = pd.crosstab(index=df['country'],columns = df['category'],normalize="index").reset_index()
cross_df.set_index('country')
cross_df['Technology'] = 1

# Build chart
bar2 = sns.barplot(x="country", y="Technology", data=cross_df, color='orange')
bar1 = sns.barplot(x="country", y="Automotive", data=cross_df, color='darkblue')
plt.title('Billionaire Population by Country and Industry')

# Add legend
top_bar = mpatches.Patch(color='orange', label='Technology')
bottom_bar = mpatches.Patch(color='darkblue', label='Automotive')
plt.legend(handles=[top_bar, bottom_bar])

Observations

  • OOTB Support does not exist
  • Data Preparation plays a major role
  • Custom coding necessary
  • Static charts

Plotly

Line Bar Combo Chart

Code

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Prepare Data
df = df[df[df['country'].isin(['United States','China','India','Taiwan'])]['category'].isin(['Technology','Automotive'])]
cross_df = pd.crosstab(index=df['country'],
columns=df['category'])


# Build Chart
trace1 = go.Bar(
x=cross_df.index,
y=cross_df['Automotive'],
name='Automotive',
marker=dict(
color='rgb(34,163,192)'
)
)
trace2 = go.Scatter(
x=cross_df.index,
y=cross_df['Technology'],
name='Technology',
yaxis='y2',
marker=dict(
symbol='star'
)


)

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(trace1)
fig.add_trace(trace2,secondary_y=True)
fig['layout'].update(height = 600, width = 800, title = "Line Bar Graph for billionair's Distribution by Industry",xaxis=dict(
tickangle=-90
))
fig.show()

Observation

  • OOTB Support does not exist
  • Comparatively more coding required
  • Various customization are easy to build
  • Interactive charts

100% Stacked Chart

Code

fig = px.histogram(df, x='country',color="category",barnorm="percent",
text_auto=True,
title="100 % Stacked Bar Chart of # of Billionair's Distribution by Industry")
fig.show()

Observations

  • OOTB Support exist
  • Minimal Coding required
  • Various customization are easy o build
  • Interactive charts

Bringing it all together : Comparison Matrix

--

--

Mainak Mitra

Technical leader| AI, Analytics, BI, Data Engineering (Ex Google, Deloitte, Cisco, IBM, Multiple Startups) MIT, Berkley, Stanford, PMP, CSPO certified