How can you make up your data? (Part-1)
Suppose that you are in a business meeting. Would you like to see something like this:
Or something like this:
Of course, everyone will choose the second one. As a consequence, We should learn how to make up our data.
Table of Contents
- Introduction of matplotlib
- Line chart
- Markers, colors, and saving figure
- Scatter plot
- Histogram
- Box plot
- Examples by using “Titanic data”
How we will make graphs in Python?
matplotlib
matplotlib is a library that we use to make graphs.
According to a recent survey by Kaggle, many people use matplotlib and seaborn libraries due to data visualization. In this article, I will show you how to use the matplotlib library to make up our data. In the next article, I will show you how to use the seaborn library to make up your data.
Before we start you can find my notebook for this article in my GitHub or on my Kaggle.
Let’s start with importing matplotlib!
import matplotlib.pyplot as plt
%matplotlib inline
And of course NumPy ;
import numpy as np
Let’s create random numbers for use in graphs!
x = np.arange(20)
#Create 20 numbers from 0 to 19.
y = np.random.normal(10, 1, 20)
#Create 20 numbers whose mean is 10, and whose standard deviation is 1.
z = np.random.normal(10, 2, 20)
#Create 20 numbers whose mean is 10, and whose standard deviation is 2.
When you run this code on your computer, do not worry! y and z are random numbers so we will have different results.
Line chart
The line chart is a type of graph which we use to see small or long differences in the same time period for more than one category.
Let’s make an example!
fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.
ax=fig.add_subplot(111)
# It says that there is 1 row (first number), there is 1 column, and
#from these the first graph is ... (It is normal that you didn't get
#the subplot() but you will understand in the next examples).
ax.plot(x,y)
#Create a line chart and then put the "x" values on the x-axis, and
#put the "y" values on the y-axis.
ax.set_title("Line chart")
#Give the title to this table.
ax.set_xlabel("x")
# Give the title to the x-axis.
ax.set_ylabel("y")
#Give the title to the y-axis.
Output:
Or more than one category, according to “x” values!
Let’s make an example!
fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.
ax=fig.add_subplot(111)
#It says that there is 1 row (first number), there is 1
#column(second number), and from these the first graph(third number) is ...
ax.plot(x,y,label="Thing # 1")
#Create a line chart and then put the "x" values to the x-axis, and put
#the "y" values to the y-axis. And the label of this graph is "Thing 1".
ax.plot(x,z,label="Thing # 2")
#Create a line chart and then put the "x" values on the x-axis, and
#put the "z" values on the y-axis. And the label of this graph is "Thing 2".
ax.legend()
#Putting a legend on the graph.
Output:
Using markers!
Markers are icons which are showing values.
You can use these markers :
- “.” point marker
- “,” pixel marker
- “o” circle marker
- “v” triangle_down marker
- “^” triangle_up marker
- “<” triangle_left marker
- “>” triangle_right marker
- “1” tri_down marker
- “2” tri_up marker
- “3” tri_left marker
- “4” tri_right marker
- “8” octagon marker
- “s” square marker
- “p” pentagon marker
- “P” plus (filled) marker
- “*” star marker
- “h” hexagon1 marker
- “H” hexagon2 marker
- “+” plus marker
- “x” x marker
- “X” x (filled) marker
- “D” diamond marker
- “d” thin_diamond marker
- “|” vline marker
“_” hline marker
Using colors!
You can use these colors :
- “b” blue
- “g” green
- “r” red
- “c” cyan
- “m” magenta
- “y” yellow
- “k” black
- “w” white
Saving the figure!
We will use .savefig(“filename.png”, dpi=integer)
Let’s make an example!
fig = plt.figure()
#It is necessary for making a figure, a chart, or a graph.
ax=fig.add_subplot(111)
#It says that there is 1 row (first number), there is 1
#column (second number), and from these the first graph(third number) is ...
ax.plot(x,y,marker=".",c="b")
#Create a line chart and then put the "x" values on the x-axis, and
#put the "y" values on the y-axis.
#marker is the icon for values.
#c = color and "b" is blue.
ax.set_title("Line chart")
# Setting title to the graph.
ax.set_xlabel("x",fontsize=30)
# label to the x-axis.
# fontsize= size of the label.
ax.set_ylabel("y",fontsize=20)
# label to the y-axis.
# fontsize= size of the label.
fig.savefig("ilkgrafim.png",dpi=300)
#Saving figure.
#dpi is for setting picture resolution.
Output:
Scatter plot
To determine whether one variable depends on another, or to determine the correlation between two variables, we utilize a particular sort of graph called a scatter plot.
Histogram
We use a particular sort of graph called a histogram to show how a variable is distributed.
Box plot
The box plot is a type of graph that we generally use to find outliers.
Let’s make an example!
fig2 = plt.figure(figsize = (15,5))
#It is necessary for making a figure, a chart, or a graph.
#figsize gives the size of the figure.
a = fig2.add_subplot(131)
#It says that there is 1 row (first number), and there are 3
#column(second number), and from these the first graph(third number) is ...
a.scatter(x,y)
#Create a scatter plot and then put the "x" values to the x-axis,
#and put the "y" values on the y-axis.
a.set_title('Scatter plot')
#Setting title to the graph.
a.set_xlabel("x")
#Give the title to the x-axis.
b = fig2.add_subplot(132)
#It says that there is 1 row (first number), and there are 3
#columns (second number), and from these the second graph(third number) is ...
b.hist(y)
#Create a histogram and put "y" values on the x-axis.
b.set_title('Histogram')
#Setting title to the graph.
c = fig2.add_subplot(133)
#It says that there is 1 row (first number), and there are 3
#columns (second number), and from these the third graph(third number) is ...
c.boxplot([y,z])
#Create a line chart and then put the "y" values on the x-axis, and
#put the "z" values on the y-axis.
c.set_title('Box plot')
#Setting title to the graph.
fig2.suptitle('Visualization')
#suptitle = super title so that you put a title to all figures.
Output:
Let’s make examples from Titanic data!
Importing pandas
import pandas as pd
Data reading
If you don’t know how to read data you should read this article.
df=pd.read_csv('titanic.csv')
EDA
If you don’t know EDA I highly recommend you read this article and for filtering data, you should read this article.
Let’s start!
df.head()
Output:
df.tail()
Output:
df.info()
Output:
df.isnull().sum()
Output:
As you can see there are 177 null values in our data, we haven’t learned how to fill empty values yet so I will not take these empty values in my Data Visualisation.
Let’s create a new data frame that does not include empty values of the “Age” column!
notMissing = df[df['Age'].notnull()]
Let’s start to make up for our data!
ship = plt.figure(figsize = (10,6))
#It is necessary for making a figure, a chart, or a graph.
#figsize gives the size of the figure.
hist = ship.add_subplot(221)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the first graph(third number) is ...
hist.hist(notMissing['Age'])
#Create a histogram and then put the values of the "Age" column on the x-axis.
scatter = ship.add_subplot(222)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the second graph(third number) is ...
scatter.scatter(notMissing['Age'], notMissing['Fare'])
#Create a scatter plot and put values of the "Age" column on the
#x-axis, and put values of the "Fare" column on the y-axis.
box1 = ship.add_subplot(223)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the third graph(third number) is ...
box1.boxplot([notMissing[notMissing['Survived'] == 0]['Age'],
notMissing[notMissing['Survived'] == 1]['Age']],
labels = ['Deceased', 'Survived'])
#Create a box plot and put the ages of the deceased people on the
#x-axis, and put the ages of the survived people on the y-axis.
#Put the label "Deceased" on the x-axis and put the label "Survived"
#on the y-axis.
box2 = ship.add_subplot(224)
#It says that there are 2 rows (first number), and there are 2
#columns (second number), and from these the fourth graph(third number) is ...
box2.boxplot([notMissing[notMissing['Sex'] == 'male']['Age'],
notMissing[notMissing['Sex'] == 'female']['Age']],
labels = ['Male', 'Female'])
#Create a box plot and put the ages of the male people on the
#x-axis, and put the ages of the female people on the y-axis.
#Put the label "Male" on the x-axis and put the label "Female" on the y-axis.
Output:
Any other examples…
df.plot.scatter(x="Age",y="Fare",title="Scatter Plot")
# Create a scatter plot then put values of the "Age" column from df
#on the x-axis and values of the "Fare" column from df on the y-#axis.
#Set the title "Scatter Plot"
Output:
df.Age.plot.hist()
#Create a histogram and put values of "Age" from df on the x-axis.
#If you will put ";" end of your code you will not see such a thing like :
# "<AxesSubplot:ylabel='Frequency'>".
Output:
df.Age.plot.hist();
Output:
df.Pclass.value_counts().sort_index().plot.bar();
#Count the values of the "Pclass" column from df and then sort the index.
#After that, create a vertical bar and put these values on the x-axis.
Output:
df.Pclass.value_counts().sort_index().plot.barh();
#Count the values of the "Pclass" column from df and then sort the index.
#After that, Create a horizontal bar and put these values on the y-axis.
Output:
Makeup hasn’t been over yet!
You have learned how to use matplotlib. If you want to be an expert on matplotlib you should examine this site.