Transform Your Data into Insightful Visuals with Python Data Visualisation Libraries

Deepesh Nishad
CodeX
Published in
10 min readApr 4, 2023

Data Visualization: Making Data Come to Life

“A picture is worth a thousand words” — this phrase holds true, especially when we are dealing with vast and complex data sets. The ability to effectively represent data in a graphical format is known as data visualization. It is the art of displaying data in a visually appealing manner so that insights and patterns can be easily identified and interpreted.

In today’s fast-paced business environment, data visualization has become a vital tool for communicating insights to clients, stakeholders, and customers. It helps in summarizing findings and presenting them in a way that facilitates understanding and aids decision-making.

As a data scientist/Analyst, one of the most critical skills you need is the ability to tell a compelling story using data. Visualization techniques help you achieve this by presenting data in an approachable and stimulating way. Moreover, the use of visualization tools can help you extract valuable insights from the data, and hence make more informed decisions.

In this blog, we will focus on teaching you how to take complex data sets that seem meaningless at first glance, and present them in a format that makes sense to people. We will use various data visualization libraries in Python, such as Matplotlib, Seaborn, and Folium.

Let’s answer some questions that pops up in our head right after reading the HEADLINE of this blog

1. What is data visualization?

Data visualization is the process of representing data in a visual format that can be easily understood and interpreted. It is a way of communicating complex data sets through charts, graphs, and other visual aids. Data visualization allows us to quickly identify patterns, trends, and relationships that might not be apparent through other methods.

Data visualization techniques can be used to present data from various sources, including spreadsheets, databases, and statistical software. It can also help to provide a clear picture of data to clients, stakeholders, and decision-makers.

2. Why is data visualization important?

Data visualization is an essential tool for decision-makers who need to make sense of large and complex data sets. By presenting data in a visual format, it becomes easier to understand and interpret, allowing decision-makers to extract valuable insights and make informed decisions. Data visualization can also help to identify trends and patterns that might otherwise go unnoticed.

Moreover, data visualization is an effective means of communicating complex information to a broader audience. By presenting data visually, it is easier to understand, making it more engaging and impactful. In summary, data visualization is important because it helps decision-makers to make informed decisions, communicate more effectively, and extract valuable insights from complex data.

3. Techniques for effective data visualization

Choosing the right chart type: One of the critical factors in effective data visualization is selecting the right chart type. Bar charts, line graphs, scatter plots, and pie charts are among the most commonly used chart types. The choice of chart type will depend on the data being presented and the story that needs to be told.

Customizing chart appearance: Customization is an essential aspect of data visualization. Charts can be customized by adding titles, labels, legends, and color schemes. Customization is crucial because it can help to highlight key points, make charts more visually appealing, and convey meaning effectively.

Incorporating interactivity: Interactive data visualization allows users to explore and analyze data in real-time. Interactive charts and graphs provide more information and allow users to drill down into the data, making it easier to identify trends, patterns, and outliers. It is an effective way of presenting complex data to a broader audience.

4. Tools for data visualization

Matplotlib: Matplotlib is a powerful data visualization library in Python. It provides extensive functionality for creating a variety of charts and plots. It allows users to create line charts, scatter plots, bar charts, and more, with full control over the customization of the chart’s appearance.

Seaborn: Seaborn is a high-level library built on top of Matplotlib, which provides an easy-to-use interface for creating stunning visualizations. It comes with built-in themes and color palettes, making it easier to create beautiful charts.

Tableau: Tableau is a powerful data visualization tool that allows users to create interactive dashboards, charts, and graphs. It is a popular tool among business analysts and data scientists because of its ease of use and powerful features.

Power BI: Power BI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities. It allows users to create interactive dashboards, reports, and charts that can be easily shared with others.

5. Visualization — Plots/Charts/Maps

• Basic Visualization Tools

— Area Plots

An area plot displays quantitative data in a graphical way by plotting the values in the Series or DataFrame as points and connecting them with lines, filling the area below the lines with color. To create an area plot in pandas, you can call the plot.area() function on a Series or DataFrame object.

import pandas as pd
import matplotlib.pyplot as plt

data = {'Year': [2010, 2011, 2012, 2013, 2014, 2015, 2016],
'Sales': [10, 15, 13, 17, 20, 15, 18]}

df = pd.DataFrame(data)

df.plot.area(x='Year', y='Sales')
plt.show()

Output

— Histograms

A histogram is a graphical representation of the distribution of numerical data. In pandas, you can create a histogram by calling the plot.hist() function on a Series or DataFrame object.

import pandas as pd
import matplotlib.pyplot as plt

data = [10, 20, 30, 20, 10, 5, 25, 15, 20]

s = pd.Series(data)

s.plot.hist()
plt.show()

Output

— Bar Charts

A bar chart is a graphical representation of categorical data in which the categories are represented by bars. In pandas, you can create a bar chart by calling the plot.bar() or plot.barh() function on a Series or DataFrame object.

import pandas as pd
import matplotlib.pyplot as plt

data = {'Year': [2010, 2011, 2012, 2013, 2014, 2015, 2016],
'Sales': [10, 15, 13, 17, 20, 15, 18]}

df = pd.DataFrame(data)

df.plot.bar(x='Year', y='Sales')
plt.show()

Output

• Specialized Visualization Tools

— Pie Charts

To create a pie chart, you can use the plot.pie() method.

import pandas as pd
import matplotlib.pyplot as plt

data = {'apples': 20, 'bananas': 10, 'oranges': 15, 'pears': 5}
df = pd.DataFrame.from_dict(data, orient='index', columns=['count'])

df.plot.pie(y='count', figsize=(5, 5), legend=False)

plt.show()

Output

— Box Plots

To create a box plot, you can use the plot.box() method.

import pandas as pd
import matplotlib.pyplot as plt

data = {'apples': [1, 2, 3, 4, 5], 'bananas': [2, 4, 6, 8, 10], 'oranges': [3, 6, 9, 12, 15]}
df = pd.DataFrame(data)

df.plot.box(figsize=(5, 5))

plt.show()

Output

— Scatter Plots

  1. A scatter plot displays the values of two variables as points on a 2D plane. It is useful for exploring the relationship between two continuous variables. To create a scatter plot in Seaborn, you can use the scatterplot() function.
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

sns.scatterplot(x="total_bill", y="tip", data=tips)

plt.show()

Output

— Bubble Plots

A bubble plot is a variation of a scatter plot that displays three variables instead of two. The third variable is represented by the size of the markers, which can be circles, squares, or any other shape. To create a bubble plot in Seaborn, you can use the scatterplot() function and pass a third variable as the size argument.

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

sns.scatterplot(x="total_bill", y="tip", data=tips, size="size", sizes=(20, 200))

plt.show()

Output

• Advanced Visualization Tools

— Waffle Charts

A waffle chart is a type of visualization that displays parts of a whole using a grid of equal squares. The size of each square represents the proportion of the data it represents. To create a waffle chart in Python, you can use the pywaffle package.

from pywaffle import Waffle
import matplotlib.pyplot as plt

data = {'Category A': 23,
'Category B': 15,
'Category C': 10,
'Category D': 7,
'Category E': 5}

fig = plt.figure(
FigureClass=Waffle,
rows=5,
values=data,
legend={'loc': 'upper left', 'bbox_to_anchor': (1.1, 1)}
)
plt.show()

Output

— Word Clouds

A word cloud is a visual representation of text data, in which the size of each word is proportional to its frequency in the text. To create a word cloud in Python, you can use the wordcloud package.

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = 'this is an example text for creating a word cloud in Python'

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

plt.figure(figsize=(12, 6))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

Output

— Seaborn and Regression Plots

Seaborn provides several functions for visualizing relationships between two numerical variables using regression analysis. These functions include regplot(), lmplot(), and jointplot().

import seaborn as sns
import pandas as pd

data = pd.read_csv('example_data.csv')
sns.regplot(x='x_column', y='y_column', data=data)

6. Best practices for data visualization

Choosing the right colors: Choosing the right colors is essential in data visualization. Colors can be used to highlight key points, distinguish between different data sets, and convey meaning. Color schemes should be carefully chosen to ensure that they are visually appealing and easy to understand.

Keeping it simple: Simplicity is key in data visualization. Charts and graphs should be easy to read and understand. Complex charts and graphs can be overwhelming and difficult to interpret. Simple and straightforward charts are more effective in conveying information.

Providing context: Providing context is important in data visualization. It helps to clarify what the data represents and why it is important. Labels, titles, and annotations can provide context and help users to better understand the data.

Avoiding clutter: Clutter can be a distraction in data visualization. It can make charts and graphs difficult to read and understand. Data should be presented in a clear and concise manner, without any unnecessary elements.

Testing and iterating: Testing and iterating is an important aspect of data visualization. Charts and graphs should be tested on a small audience before being presented to a broader audience. Feedback from the audience can be used to improve the charts and make them more effective.

6. Examples of Effective Data Visualization

Data visualization is an essential tool for data scientists, analysts, and decision-makers who need to communicate complex information to stakeholders. When done right, data visualization can help you identify trends, make predictions, and communicate insights with clarity and impact. In this section, we’ll look at some examples of effective data visualization, and what makes them successful.

  1. Interactive Dashboards

Interactive dashboards are a popular data visualization tool used by businesses to track key performance indicators (KPIs) and metrics. These dashboards allow users to interact with the data, filtering and drilling down to uncover insights. One excellent example of an interactive dashboard is the one created by the City of New York. This dashboard shows various datasets related to the city’s health and well-being, such as life expectancy, air quality, and lead poisoning rates. Users can interact with the dashboard, filter by borough, and explore the data in more detail. The dashboard makes it easy to see the impact of policies and initiatives, and identify areas for improvement.

2. Infographics

Infographics are another powerful data visualization tool that combines data with design to create compelling visual stories. Infographics are particularly useful when you need to communicate complex data to a non-technical audience. The New York Times is known for creating engaging and informative infographics that bring data to life. For example, their interactive “How 7.2 Billion Humans Live” infographic uses data from the United Nations to explore global population trends. The infographic presents the data in a visually appealing and easy-to-understand format, helping readers grasp the scale of the world’s population and the challenges that come with it.

3. Heatmaps

Heatmaps are a way of visualizing data on a 2D plane, where the intensity of color represents the magnitude of a variable. Heatmaps are useful for identifying patterns and outliers in large datasets. One example of an effective heatmap is the one used by the Weather Channel. The heatmap shows the current weather conditions across the United States, with each state shaded based on the temperature. The heatmap makes it easy to see where the hot and cold spots are, and where severe weather conditions are present.

4. Network Graphs

Network graphs are a way of visualizing relationships between different entities. They are useful for exploring social networks, supply chains, and other complex systems. One example of a network graph is the one created by the New York Times to explore the connections between donors and political candidates. The network graph shows the relationships between donors, candidates, and political action committees, allowing readers to explore the flow of money in politics.

CONCLUSION

At its core, data visualization is about communicating complex information in a way that is easy to understand and interpret. It helps individuals and organizations see patterns and trends that might not be visible in raw data. By visualizing data, we can identify correlations, outliers, and other insights that can inform decision-making.

Data visualization is a powerful tool for conveying insights to clients, customers, and stakeholders in a visually appealing and interactive way. Through this blog, we have learned how to create interesting graphics and charts using popular data visualization libraries in Python, such as Matplotlib, seaborn, and Folium. By mastering these techniques, we can effectively tell a compelling story and extract information from large data sets, leading to better understanding and more effective decision-making. With the ability to present data in a way that makes sense to people, we can unlock the full potential of our findings and make a real impact.

‘’Data are just summaries of thousands of stories”

Thank you!

--

--

Deepesh Nishad
CodeX
Writer for

A skilled business analyst who draws out the needs that are not yet unknown .