Published in


Exploratory Data Analysis in Python, Using Just 9 functions

Exploratory Data Analysis (EDA) can be an essential part of your data science process. I want to emphasize the work “can”. I’ve seen many people expand their EDA process to a point of overkill. Of course there are always more patterns to be found, but you need to build a sense of awareness for when your EDA process has gone long enough, and you have a good feel for the data. The goal (in most cases) is not to explore the data — it is to analyze the data in some way, often through a model.

In an effort to make your EDA processes more efficient, here are 9 functions I use for quick EDA!

Note!!!! — These functions require Pandas and Numpy.

For any data frame the .info() function will tell you how many entrees you have, the names of each column, the data type of each column, and how many non-null values you have in each column. You can compare the quantity of non-null values to the total number of entries to find which columns have null values.

Find Duplicates


There are multiple ways to find duplicates rows in your dataset. This function above is the easies, as it will find all the duplicate entries and print how many there are. If it prints “0”, there are no duplicates and you are good to go!

Find Unique Values in a Column


In much of you EDA, you are focused on a few key columns. This functions quickly prints all the unique values of that column, so you can understand the breadth and range of the values. Below is what the output looks like:

Find the Counts of Unique Values in a Column


This function build upon the previous one by providing you the unique values in that column that have the largest and smallest frequencies. This is a great way to look for outliers.

Find all the Null Values in a Dataframe


This function combines .isnull() and .sum() and will return a list of each column in the data frame with the amount of null values in each column. Finding null values is an important part of EDA and data cleaning. Here is the output of the function call:

Fill Null Values with Zeros (or any filler)

df.replace(np.nan, "0", inplace = True)

This function will take your entire data frame and fill the null values with zeros, or whatever value you put in the second argument of the function. It is certainly the fastest way to get rid of your null values, putting your dataset in a place that will avoid more errors and dead-ends in your analysis. If you are not sure whether or not Null values will impact your analysis, I advise you to either fill them or delete the entries that hold the null values.

Filter Rows in your Dataframe

df2 = df[df["column_name"] > 100]

The line of code above creates a new data frame that hold all the rows, where “column_name” is greater than some value. You can, of course, filter on other conditionals such as “less than” or “equals to” and more complex conditionals, with multiple conditions.

Create a box-plot for any column


The function above will return box plots for all the numerical columns in dataset.

To specify that the box plot only be created for a certain column, use this function:


Create a Correlation Matrix


This pandas function will only return correlations for pairs of numeric columns.

To see all 9 of these functions in action, here is a quick tutorial video:

Thanks :)




The first spreadsheet that generates Python that corresponds to your edits. Check us out at

Recommended from Medium

How I reduced data processing time from 5 days to 5 hours

P Values: Significant or Outdated?

Old photo effect

Explore the Depths of Common Data Types + Formats

Simple way to find a suitable algorithm for your data in scikit-learn (Python)

Pandas!!! What I’ve Learned after my 1st On-site Technical Interview

Giordanos or the Thai place?

dClimate FAQ’s

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jake from Mito

Jake from Mito

Exploring the future of Python and Spreadsheets

More from Medium

3 Python Packages for Exploratory Data Analysis

Complete Exploratory Data Analysis Guide with Python Plotly

Pandas in action “how to automate your data analysis process”

3 important tools for Advanced EDA (Exploratory Data Analysis):