Data Visualization 101- Python for Data Analysis

Kaan ÇUKUR
4 min readDec 7, 2022

Introduction of data visualization with python.

Data visualization is the process of creating visual representations of data or information. This can be done using various tools and techniques, such as charts, graphs, plots and maps. Data visualization is an important part of data analysis and is often used to communicate findings or insights from data to others (ChatGPT).

There are 2 common libraries used for data visualization in Python. One of matplotlib and the other is the seaborn library.

When we compare these two libraries, matplotlib is used for easier typing and simpler visualizations but seaborn includes a number of statistical plotting functions that are not available in matplotlib.As a result matplotlib and seaborn are both useful libraries for creating data visualizations but they are intended for different purposes and have different strengths and weaknesses. What will we look ?

  • Install libraries and data set
  • Is column numerical or categorical ?
  • Visualization of categorical variables
  • Visualization of numerical variables

Before you start data visualization, you need to master the pandas library. You can check out my article about pandas.

We can examine the basic data visualizations on the titanic data set. Let’s start.

Install libraries and data set

In this post also i will show you data visualization with pandas library.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df=sns.load_dataset("titanic")
df.head()

We can quick look our data set.

Titanic data set

Is column numerical or categorical ?

This is our main topic. Because if we want to chose correct chart, we need to now our columns type.

Mainly if our variable is categorical then we can use bar chart or pie chart.If our variable is numerical we can use histogram and box plot.

Let’s check our data set.

df.info()

Visualization of categorical variables

As I said, for categorical variables we need to use bar chart or pie chart. Lets try.

df["sex"].value_counts().plot(kind='bar')
plt.show()

Pandas library supports to draw bar charts. In here for “sex” column’s value counts, we are drawing bar chart.

Also we can check “class” column.

df["class"].value_counts().plot(kind='bar')
plt.show()

Now we can start to read data set . For example there is more male passengers then female and most of passengers have third class ticket.

We can visualize the class column with the pie chart.

class_counts = df['class'].value_counts() #get value counts
labels = class_counts.index # get idnexs
sizes = class_counts.values
colors=["red","green","blue"]
plt.pie(sizes, labels=labels, colors=colors,)
plt.show()

Visualization of numerical variables

For numerical variables we can use box plot and violin chart. Lets look on “fare” column.

plt.boxplot(df["fare"])
plt.show()

We have new ideas. Box plot give us a statistical results. Median, quartiles and outliers. So I can say max value is near 500.median is near 25. We can check on describe function.

df.describe().T

Almost got it :)

Also we can look histogram. There is a cluester between 0–50.

plt.hist(df["fare"])
plt.show()

Last words

Thanks for reading this blog. Your comments and likes will help my growth.

You can find all source code in my github profile. You can keep in touch me from my LinkedIn profile.

If i have any mistake,please feel free for comment.

--

--