Our new friend is “EDA”

4 min readSep 30, 2022

With this article, you will have a new friend whose name is “EDA”. Let’s meet with EDA. My reader, EDA is Exploratory data analysis and it is the best friend of Data scientists. EDA, this is my reader who wants to be a good Data Scientist 😁.

Why “EDA” is important for us?

After the reading data, if we don’t deduce from the data, we can’t do anything. Therefore, we need EDA to see data, comprehend data, and make clear data.

Let’s find out what is EDA!

We are using plenty of functions. I will use “Titanic data”. You can find my notebook for this story at GitHub. Let’s start!

.head()

head() function shows you the first 5 rows of data; however, if you write a number inside of brackets, you will see the rows of data from 0 to this number. Like .head(15) gives you the first 15 rows of data.

If you don’t know how to read data, you should read this article.

Let’s make an example!

import pandas as pd df=pd.read_csv("titanic.csv")
df.head()

Output:

df.head(15)

Output:

.tail()

tail() function is very similar to the head() function but it shows rows from the last rows.

Let’s make an example!

df.tail()

Output:

df.tail(10)

Output:

.sample()

sample() function gives you one random row in the data. But if you write a number inside of brackets, it returns as many rows as the number you enter.

Let’s make an example!

df.sample()

Output:

df.sample(15)

Output:

.describe()

describe() function calculates statistical quantities (like min, max, mean…) and then gives you these quantities.

Let’s make an example!

df.describe()

Output:

.corr()

Everything has a relationship with each other. corr() function calculates the correlation between columns and gives you the result of that.

Let’s make an example!

df.corr()

Output:

.info()

info() function gives us general information about data . What is the data type? object, integer, or float? How many rows are non-null?
info() function gives us answers to these questions.

Let’s make an example!

df.info()

Output:

.isnull().sum()

isnull().sum() is actually two function: isnull() and sum().
isnull() function return True or False for each row, if it is nan value, it returns True, if not, it returns False.
sum() function gives summation of True values at these columns so that we can easily see nan values for each column.