The Basics of Data Visualization in Python

Moraneus
8 min readApr 2, 2024

--

At the heart of Python’s data visualization capabilities are libraries like Matplotlib, Seaborn and Plotly. These tools offer a solid foundation for creating static plots that are both beautiful and informative.

Getting Started

Before we dive into plotting, ensure you have Python installed on your system. For data visualization, we’ll primarily use Matplotlib, Seaborn and Plotly, three powerful libraries that serve as the cornerstone for many data visualization tasks in Python.

Installation

Open your terminal or command prompt and install the necessary packages using pip:

pip install matplotlib seabornbash plotly

Creating Plot with Matplotlib

Basic Example: Line Chart

A line chart is a good starting point for understanding trends over time. Here’s how to plot a simple line chart using Matplotlib:

import matplotlib.pyplot as plt

months = range(1, 13)
sales = [10, 13, 8, 16, 17, 20, 22, 25, 23, 18, 15, 12]

plt.plot(months, sales)
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()

Running this script will pop up a window displaying a line graph of monthly sales, providing a visual representation of sales trends throughout the year.

Advanced Example: Subplots

Subplots allow you to display multiple plots in a single figure, providing a way to visualize different datasets side by side.

import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 2, 100)
y1 = np.sin(x * np.pi)
y2 = np.cos(x * np.pi)

# Create two subplots vertically
fig, (ax1, ax2) = plt.subplots(2, 1)
fig.suptitle('Vertical Subplots')

ax1.plot(x, y1, 'r-')
ax1.set_ylabel('Sin')

ax2.plot(x, y2, 'g-')
ax2.set_ylabel('Cos')

plt.show()

This example demonstrates creating vertical subplots using Matplotlib, with the top plot displaying a sine wave and the bottom a cosine wave. It introduces concepts like creating multiple axes in a figure and customizing individual plots.

Creating Plot with Seaborn

Basic Example: Histogram

Seaborn simplifies creating statistical plots. A histogram is useful for visualizing the distribution of a dataset.

import seaborn as sns
import matplotlib.pyplot as plt # Import Matplotlib

# Sample data
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]

sns.histplot(data)
sns.set_style('darkgrid')
plt.title('Basic Histogram') # Set title using Matplotlib
plt.show()

This example demonstrates how seamlessly Seaborn integrates with Matplotlib, allowing you to leverage the strengths of both libraries for creating and customizing visualizations.

Advanced Example: Pairplot

A pairplot displays pairwise relationships in a dataset, creating a grid of axes where each variable in the data is shared across the y-axes and x-axes of the plots.

import matplotlib.pyplot as plt  # Import Matplotlib
import seaborn as sns

sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")

sns.pairplot(iris, hue='species')
plt.show()

This example uses Seaborn to create a pairplot of the Iris dataset, automatically plotting pairwise relationships across all its numeric variables and coloring points by species. It highlights Seaborn’s capability for complex statistical visualizations with minimal code.

The Iris dataset is one of the most famous datasets used in the field of machine learning and statistics. It’s often used as a beginner’s dataset for teaching purposes because it’s relatively simple yet versatile enough to teach fundamental concepts in data analysis, classification, and machine learning algorithms.

Creating Plot with Plotly

Basic Example: Scatter Plot

Plotly excels at creating interactive plots. A scatter plot is a fundamental way to visualize the relationship between two variables.

import plotly.express as px

# Sample data
df = px.data.iris()

# Create a scatter plot
fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species')
fig.show()

This script generates an interactive scatter plot of the Iris dataset’s sepal width and length, colored by species. It introduces the ease of creating interactive plots with Plotly Express.

Advanced Example: 3D Scatter Plot

Plotly can also create sophisticated 3D visualizations, such as 3D scatter plots, to explore complex datasets.

import plotly.express as px

# Sample data
df = px.data.iris()

# Create a 3D scatter plot
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width', color='species')
fig.show()

This example demonstrates creating a 3D scatter plot with Plotly Express, offering an interactive way to explore relationships between three variables. Users can rotate and zoom the plot to view the data from different angles, showcasing Plotly’s capability for creating dynamic and complex visualizations.

Diving Deeper with Plotly and Dash

Plotly and Dash: A Powerful Combo for Web-Based Data Visualization

For those looking to specialize in data visualization, Plotly and Dash by Plotly offer powerful tools for creating interactive, web-based plots. Dash, in particular, allows for the development of rich web applications entirely in Python.

Installation

To get started, you’ll need to install both Plotly and Dash:

pip install plotly dash

Creating an Interactive Web Application with Dash

Dash allows us to create a web application entirely in Python, without the need for HTML or JavaScript. Here’s how you can create a basic interactive web application displaying a dynamic line chart:

import dash
import dash_core_components as dcc
import dash_html_components as html
from pandas import DataFrame

# Sample data
df = DataFrame({
'Month': range(1, 13),
'Sales': [10, 13, 8, 16, 17, 20, 22, 25, 23, 18, 15, 12]
})

# Dash app initialization
app = dash.Dash(__name__)

app.layout = html.Div(children=[
html.H1(children='Interactive Sales Trend'),
dcc.Graph(
id='sales-graph',
figure={
'data': [
{'x': df['Month'], 'y': df['Sales'], 'type': 'line', 'name': 'Sales'},
],
'layout': {
'title': 'Monthly Sales Data'
}
}
)
])

if __name__ == '__main__':
app.run_server(debug=True)

By default, Dash will run on http://127.0.0.1:8050/

This code snippet creates a web application that serves as an interactive chart. Users can explore the sales data through the web interface, offering a more engaging experience than static charts.

Case Study: Building a Comprehensive Dashboard

Let’s delve into creating a comprehensive dashboard using Dash, with a focus on visualizing sales data for a multinational corporation.

Using Dash, you could build a comprehensive dashboard that includes:

  • Interactive line graphs for sales trends.
  • Bar charts for comparison between regions.
  • Pie charts for market share insights.
  • Forecasting models visualized over time.

Getting Started

First, ensure Dash and Pandas are installed in your environment:

pip install dash pandas

Step 1: Preparing the Data

First, we create a sample dataset representing monthly sales data across different regions for a year. Additionally, we’ll simulate a forecasting model’s predictions for future sales over the next six months.

import pandas as pd

data = {
'Date': pd.date_range(start="2023-01-01", periods=12, freq='ME'),
'Region': ['North', 'South', 'East', 'West'] * 3,
'Total Sales': [1000, 1500, 700, 1200, 1100, 1600, 800, 1300, 1200, 1700, 900, 1400],
'Market Share': [25, 35, 15, 25, 26, 36, 14, 24, 27, 37, 13, 23],
}

df = pd.DataFrame(data)

forecast_dates = pd.date_range(start=df['Date'].max() + pd.Timedelta(days=1), periods=6, freq='ME')
forecast_sales = [1600, 1700, 1800, 1900, 2000, 2100]

forecast_df = pd.DataFrame({
'Date': forecast_dates,
'Predicted Sales': forecast_sales
})

# Apply sum() only to 'Total Sales' and average to 'Market Share' by excluding 'Date' and grouping correctly
# Also, ensure that the 'Market Share' calculation makes sense for your specific case. Here it's averaged as an example.
grouped = df.groupby('Region').agg({'Total Sales':'sum', 'Market Share':'mean'}).reset_index()

Step 2: Setting Up the Dash Application

Next, we initialize the Dash app and define its layout to include placeholders for our visualizations.

from dash import Dash, html, dcc

app = Dash(__name__)

# Placeholder for the app's layout
app.layout = html.Div([
html.H1('Sales Data Dashboard'),
dcc.Graph(id='line-graph'),
dcc.Graph(id='bar-chart'),
dcc.Graph(id='pie-chart'),
dcc.Graph(id='forecasting-model'),
])

Step 3: Creating Visualizations

We now create each visualization using Plotly Express, leveraging our sales data and the forecasting model’s output.

Interactive Line Graphs for Sales Trends

import plotly.express as px

fig_line = px.line(df, x='Date', y='Total Sales', color='Region', title='Sales Trends Over Time')

Bar Charts for Comparison Between Regions

fig_bar = px.bar(grouped, x='Region', y='Total Sales', title='Sales Comparison by Region')

Pie Charts for Market Share Insights

fig_pie = px.pie(grouped, names='Region', values='Market Share', title='Market Share by Region')

Forecasting Models Visualized Over Time
For the forecasting model, let’s assume you’ve already developed a model that predicts future sales based on historical data. We’ll visualize these predictions as a line graph.

fig_forecast = px.line(forecast_df, x='Date', y='Predicted Sales', title='Sales Forecast for the Next 6 Months')

Step 4: Integrating Visualizations into the Dashboard

Each figure is then assigned to the corresponding dcc.Graph component within the app's layout.

app.layout = html.Div([
html.H1('Sales Data Dashboard'),
dcc.Graph(id='line-graph', figure=fig_line),
dcc.Graph(id='bar-chart', figure=fig_bar),
dcc.Graph(id='pie-chart', figure=fig_pie),
dcc.Graph(id='forecasting-model', figure=fig_forecast),
])

Step 5: Running the Dashboard

Finally, to make our dashboard accessible, we run the Dash app:

if __name__ == '__main__':
app.run_server(debug=True)

Here you can find the full example with built-in data instead of the need to create a CSV file


import pandas as pd
import plotly.express as px
from dash import Dash, html, dcc

# Corrected frequency from 'M' to 'ME' for end of month
data = {
'Date': pd.date_range(start="2023-01-01", periods=12, freq='ME'),
'Region': ['North', 'South', 'East', 'West'] * 3,
'Total Sales': [1000, 1500, 700, 1200, 1100, 1600, 800, 1300, 1200, 1700, 900, 1400],
'Market Share': [25, 35, 15, 25, 26, 36, 14, 24, 27, 37, 13, 23],
}

df = pd.DataFrame(data)

forecast_dates = pd.date_range(start=df['Date'].max() + pd.Timedelta(days=1), periods=6, freq='ME')
forecast_sales = [1600, 1700, 1800, 1900, 2000, 2100]

forecast_df = pd.DataFrame({
'Date': forecast_dates,
'Predicted Sales': forecast_sales
})

# Apply sum() only to 'Total Sales' and average to 'Market Share' by excluding 'Date' and grouping correctly
# Also, ensure that the 'Market Share' calculation makes sense for your specific case. Here it's averaged as an example.
grouped = df.groupby('Region').agg({'Total Sales':'sum', 'Market Share':'mean'}).reset_index()

app = Dash(__name__)

fig_line = px.line(df, x='Date', y='Total Sales', color='Region', title='Sales Trends Over Time')
fig_bar = px.bar(grouped, x='Region', y='Total Sales', title='Sales Comparison by Region')
fig_pie = px.pie(grouped, names='Region', values='Market Share', title='Market Share by Region')
fig_forecast = px.line(forecast_df, x='Date', y='Predicted Sales', title='Sales Forecast for the Next 6 Months')

app.layout = html.Div([
html.H1('Sales Data Dashboard'),
dcc.Graph(id='line-graph', figure=fig_line),
dcc.Graph(id='bar-chart', figure=fig_bar),
dcc.Graph(id='pie-chart', figure=fig_pie),
dcc.Graph(id='forecasting-model', figure=fig_forecast),
])

if __name__ == '__main__':
app.run_server(debug=True)

Conclusion

Data visualization in Python stands as a powerful gateway to extracting insights from data, transforming complex datasets into compelling visual stories. Whether for exploratory data analysis, statistical modeling, or presenting business intelligence insights, Python’s ecosystem offers a tool for every visualization need. Through interactive examples and practical applications, we’ve seen how these libraries enable us to bring data to life, making it actionable and understandable for both technical and non-technical audiences alike.

In summary, mastering data visualization in Python not only enhances our analytical skills but also elevates our ability to communicate data-driven insights effectively, making it an indispensable skill set in the data science and analytics fields.

Your Support Means a Lot! 🙌

If you enjoyed this article and found it valuable, please consider giving it a clap to show your support. Feel free to explore my other articles, where I cover a wide range of topics related to Python programming and others. By following me, you’ll stay updated on my latest content and insights. I look forward to sharing more knowledge and connecting with you through future articles. Until then, keep coding, keep learning, and most importantly, enjoy the journey!

Happy programming!

--

--