Data Science

Step by Step — Run Exploratory Data Analysis

How can you draw relevant conclusions without knowing anything about the underlying data?

Aurélie Giraud
Feb 12 · 7 min read
Photo by Jason Blackeye on Unsplash

Get Started

import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import seaborn as sns
%matplotlib inline
# Read the data
df = pd.read_csv(Path.cwd()/'notes.csv')
df.head()
Dataset with dimension of banknotes

Univariate Analysis

Basic Descriptive Statistics of the data
# How many individuals do we have for each category?
df.groupby(‘is_genuine’).count().iloc[:,0]
Count of unique individuals by Category
# Remove the boolean column
tmp = df.iloc[:,1:]
# Check the Distribution for each columns
for i in tmp.columns:
plt.figure(figsize=(8,6))
tmp1 = df[df[‘is_genuine’] == True]
tmp2 = df[df[‘is_genuine’] == False]
plt.hist(tmp1[i], bins=50, alpha=0.5, label=”genuine”)
plt.hist(tmp2[i], bins=50, alpha=0.5, label=”fake”)
plt.title(i)
plt.legend(loc=’upper right’)
# Create boxplots to visualize the potential outliers
fig, ax_new = plt.subplots(3,2, sharey=False,figsize=(20,17))
df.boxplot(by=”is_genuine”,ax=ax_new)
Visualization of the outliers for each categories of variables

Outliers

# Create a function to identify the outliers for each features and for each categoriesdef get_outliers(df):
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 — Q1
df_out = df[((df < (Q1–1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]

return df_out
# Apply the function at labels level
df.groupby(‘is_genuine’).apply(get_outliers).reset_index(drop=True)
List of outliers identify based on the IQR method

Bivariate Analysis

# Visualize the correlation & distribution of the variables 
sns.pairplot(df)
Which variables seem to be correlated or not together?
# Heatmap of correlation matrix
sns.heatmap(df.corr(),annot = True)
HeatMap to visualize the Correlation Matrix

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Aurélie Giraud

Written by

Analytic Translator | AI/ML & Statistics Player | Unlock Business Opportunities ✅𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗰𝗼𝗺/𝗶𝗻/𝗮𝘂𝗿𝗲𝗹𝗶𝗲𝗴𝗶𝗿𝗮𝘂𝗱

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Aurélie Giraud

Written by

Analytic Translator | AI/ML & Statistics Player | Unlock Business Opportunities ✅𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗰𝗼𝗺/𝗶𝗻/𝗮𝘂𝗿𝗲𝗹𝗶𝗲𝗴𝗶𝗿𝗮𝘂𝗱

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store