How can you make up your data? (Part-1)

Ahmet Talha Bektaş
9 min readOct 29, 2022

--

Suppose that you are in a business meeting. Would you like to see something like this:

Or something like this:

Of course, everyone will choose the second one. As a consequence, We should learn how to make up our data.

Photo by Laura Chouette on Unsplash

Table of Contents

How we will make graphs in Python?

matplotlib

matplotlib is a library that we use to make graphs.

According to a recent survey by Kaggle, many people use matplotlib and seaborn libraries due to data visualization. In this article, I will show you how to use the matplotlib library to make up our data. In the next article, I will show you how to use the seaborn library to make up your data.

As you can see matplotlib is one of the trend libraries for data visualization.

Source of this survey

Before we start you can find my notebook for this article in my GitHub or on my Kaggle.

Let’s start with importing matplotlib!

import matplotlib.pyplot as plt
%matplotlib inline

And of course NumPy ;

import numpy as np

Let’s create random numbers for use in graphs!

x = np.arange(20)
#Create 20 numbers from 0 to 19.

y = np.random.normal(10, 1, 20)
#Create 20 numbers whose mean is 10, and whose standard deviation is 1.

z = np.random.normal(10, 2, 20)
#Create 20 numbers whose mean is 10, and whose standard deviation is 2.

When you run this code on your computer, do not worry! y and z are random numbers so we will have different results.

Line chart

The line chart is a type of graph which we use to see small or long differences in the same time period for more than one category.

Let’s make an example!

fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.

ax=fig.add_subplot(111)
# It says that there is 1 row (first number), there is 1 column, and
#from these the first graph is ... (It is normal that you didn't get
#the subplot() but you will understand in the next examples).

ax.plot(x,y)
#Create a line chart and then put the "x" values on the x-axis, and
#put the "y" values on the y-axis.

ax.set_title("Line chart")
#Give the title to this table.

ax.set_xlabel("x")
# Give the title to the x-axis.

ax.set_ylabel("y")
#Give the title to the y-axis.

Output:

Or more than one category, according to “x” values!

Let’s make an example!

fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.

ax=fig.add_subplot(111)
#It says that there is 1 row (first number), there is 1
#column(second number), and from these the first graph(third number) is ...

ax.plot(x,y,label="Thing # 1")
#Create a line chart and then put the "x" values to the x-axis, and put
#the "y" values to the y-axis. And the label of this graph is "Thing 1".

ax.plot(x,z,label="Thing # 2")
#Create a line chart and then put the "x" values on the x-axis, and
#put the "z" values on the y-axis. And the label of this graph is "Thing 2".

ax.legend()
#Putting a legend on the graph.

Output:

Using markers!

Markers are icons which are showing values.

You can use these markers :

  • .” point marker
  • “,” pixel marker
  • “o” circle marker
  • “v” triangle_down marker
  • “^” triangle_up marker
  • “<” triangle_left marker
  • “>” triangle_right marker
  • “1” tri_down marker
  • “2” tri_up marker
  • “3” tri_left marker
  • “4” tri_right marker
  • “8” octagon marker
  • “s” square marker
  • “p” pentagon marker
  • “P” plus (filled) marker
  • “*” star marker
  • “h” hexagon1 marker
  • “H” hexagon2 marker
  • “+” plus marker
  • “x” x marker
  • “X” x (filled) marker
  • “D” diamond marker
  • “d” thin_diamond marker
  • “|” vline marker

“_” hline marker

Using colors!

You can use these colors :

  • “b” blue
  • “g” green
  • “r” red
  • “c” cyan
  • “m” magenta
  • “y” yellow
  • “k” black
  • “w” white

Saving the figure!

We will use .savefig(“filename.png”, dpi=integer)

Let’s make an example!

fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.

ax=fig.add_subplot(111)
#It says that there is 1 row (first number), there is 1
#column (second number), and from these the first graph(third number) is ...

ax.plot(x,y,marker=".",c="b")
#Create a line chart and then put the "x" values on the x-axis, and
#put the "y" values on the y-axis.

#marker is the icon for values.

#c = color and "b" is blue.

ax.set_title("Line chart")
# Setting title to the graph.

ax.set_xlabel("x",fontsize=30)
# label to the x-axis.

# fontsize= size of the label.

ax.set_ylabel("y",fontsize=20)
# label to the y-axis.

# fontsize= size of the label.

fig.savefig("ilkgrafim.png",dpi=300)
#Saving figure.

#dpi is for setting picture resolution.

Output:

Scatter plot

To determine whether one variable depends on another, or to determine the correlation between two variables, we utilize a particular sort of graph called a scatter plot.

Histogram

We use a particular sort of graph called a histogram to show how a variable is distributed.

Box plot

The box plot is a type of graph that we generally use to find outliers.

Let’s make an example!

fig2 = plt.figure(figsize = (15,5))
#It is necessary for making a figure, a chart, or a graph.
#figsize gives the size of the figure.

a = fig2.add_subplot(131)
#It says that there is 1 row (first number), and there are 3
#column(second number), and from these the first graph(third number) is ...

a.scatter(x,y)
#Create a scatter plot and then put the "x" values to the x-axis,
#and put the "y" values on the y-axis.

a.set_title('Scatter plot')
#Setting title to the graph.

a.set_xlabel("x")
#Give the title to the x-axis.

b = fig2.add_subplot(132)
#It says that there is 1 row (first number), and there are 3
#columns (second number), and from these the second graph(third number) is ...

b.hist(y)
#Create a histogram and put "y" values on the x-axis.

b.set_title('Histogram')
#Setting title to the graph.

c = fig2.add_subplot(133)
#It says that there is 1 row (first number), and there are 3
#columns (second number), and from these the third graph(third number) is ...

c.boxplot([y,z])
#Create a line chart and then put the "y" values on the x-axis, and
#put the "z" values on the y-axis.

c.set_title('Box plot')
#Setting title to the graph.

fig2.suptitle('Visualization')
#suptitle = super title so that you put a title to all figures.

Output:

Let’s make examples from Titanic data!

Importing pandas

import pandas as pd

Data reading

If you don’t know how to read data you should read this article.

df=pd.read_csv('titanic.csv')

EDA

If you don’t know EDA I highly recommend you read this article and for filtering data, you should read this article.

Let’s start!

df.head()

Output:

df.tail()

Output:

df.info()

Output:

df.isnull().sum()

Output:

As you can see there are 177 null values in our data, we haven’t learned how to fill empty values yet so I will not take these empty values in my Data Visualisation.

Let’s create a new data frame that does not include empty values of the “Age” column!

notMissing = df[df['Age'].notnull()]

Let’s start to make up for our data!

ship = plt.figure(figsize = (10,6))
#It is necessary for making a figure, a chart, or a graph.
#figsize gives the size of the figure.

hist = ship.add_subplot(221)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the first graph(third number) is ...

hist.hist(notMissing['Age'])
#Create a histogram and then put the values of the "Age" column on the x-axis.

scatter = ship.add_subplot(222)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the second graph(third number) is ...

scatter.scatter(notMissing['Age'], notMissing['Fare'])
#Create a scatter plot and put values of the "Age" column on the
#x-axis, and put values of the "Fare" column on the y-axis.

box1 = ship.add_subplot(223)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the third graph(third number) is ...

box1.boxplot([notMissing[notMissing['Survived'] == 0]['Age'],
notMissing[notMissing['Survived'] == 1]['Age']],
labels = ['Deceased', 'Survived'])
#Create a box plot and put the ages of the deceased people on the
#x-axis, and put the ages of the survived people on the y-axis.

#Put the label "Deceased" on the x-axis and put the label "Survived"
#on the y-axis.

box2 = ship.add_subplot(224)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the fourth graph(third number) is ...

box2.boxplot([notMissing[notMissing['Sex'] == 'male']['Age'],
notMissing[notMissing['Sex'] == 'female']['Age']],
labels = ['Male', 'Female'])
#Create a box plot and put the ages of the male people on the
#x-axis, and put the ages of the female people on the y-axis.

#Put the label "Male" on the x-axis and put the label "Female" on the y-axis.

Output:

Any other examples…

df.plot.scatter(x="Age",y="Fare",title="Scatter Plot")
# Create a scatter plot then put values of the "Age" column from df
#on the x-axis and values of the "Fare" column from df on the y-#axis.

#Set the title "Scatter Plot"

Output:

df.Age.plot.hist()
#Create a histogram and put values of "Age" from df on the x-axis.
#If you will put ";" end of your code you will not see such a thing like :
# "<AxesSubplot:ylabel='Frequency'>".

Output:

df.Age.plot.hist();

Output:

df.Pclass.value_counts().sort_index().plot.bar();
#Count the values of the "Pclass" column from df and then sort the index.
#After that, create a vertical bar and put these values on the x-axis.

Output:

df.Pclass.value_counts().sort_index().plot.barh();
#Count the values of the "Pclass" column from df and then sort the index.
#After that, Create a horizontal bar and put these values on the y-axis.

Output:

Makeup hasn’t been over yet!

You have learned how to use matplotlib. If you want to be an expert on matplotlib you should examine this site.

Now you should read “How can you make up your data? (Part-2)” !

Author:

Ahmet Talha Bektaş

If you want to ask anything to me, you can easily contact me!

📧My email

🔗My LinkedIn

💻My GitHub

👨‍💻My Kaggle

📋 My Medium

--

--