# Step by Step — Run Exploratory Data Analysis

## How can you draw relevant conclusions without knowing anything about the underlying data?

Feb 12 · 7 min read

## Get Started

`import pandas as pdimport numpy as npfrom pathlib import Pathimport matplotlib.pyplot as pltfrom matplotlib.cbook import boxplot_stats  import seaborn as sns%matplotlib inline# Read the datadf = pd.read_csv(Path.cwd()/'notes.csv')df.head()`

# Univariate Analysis

`# How many individuals do we have for each category?df.groupby(‘is_genuine’).count().iloc[:,0]`
`# Remove the boolean columntmp = df.iloc[:,1:]# Check the Distribution for each columnsfor i in tmp.columns: plt.figure(figsize=(8,6)) tmp1 = df[df[‘is_genuine’] == True] tmp2 = df[df[‘is_genuine’] == False] plt.hist(tmp1[i], bins=50, alpha=0.5, label=”genuine”) plt.hist(tmp2[i], bins=50, alpha=0.5, label=”fake”) plt.title(i) plt.legend(loc=’upper right’)`
`# Create boxplots to visualize the potential outliersfig, ax_new = plt.subplots(3,2, sharey=False,figsize=(20,17))df.boxplot(by=”is_genuine”,ax=ax_new)`

# Outliers

`# Create a function to identify the outliers for each features and for each categoriesdef get_outliers(df): Q1 = df.quantile(0.25) Q3 = df.quantile(0.75) IQR = Q3 — Q1 df_out = df[((df < (Q1–1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]   return df_out# Apply the function at labels leveldf.groupby(‘is_genuine’).apply(get_outliers).reset_index(drop=True)`

# Bivariate Analysis

`# Visualize the correlation & distribution of the variables sns.pairplot(df)`
`# Heatmap of correlation matrixsns.heatmap(df.corr(),annot = True)`

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

### By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

Medium sent you an email at to complete your subscription.

Written by

## Aurélie Giraud

Analytic Translator | AI/ML & Statistics Player | Unlock Business Opportunities ✅𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗰𝗼𝗺/𝗶𝗻/𝗮𝘂𝗿𝗲𝗹𝗶𝗲𝗴𝗶𝗿𝗮𝘂𝗱

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Written by

## Aurélie Giraud

Analytic Translator | AI/ML & Statistics Player | Unlock Business Opportunities ✅𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗰𝗼𝗺/𝗶𝗻/𝗮𝘂𝗿𝗲𝗹𝗶𝗲𝗴𝗶𝗿𝗮𝘂𝗱

## Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## More From Medium

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app