EDA in Pandas

Roland Jeannier
1 min readSep 6, 2017

--

Over the next few blogs I am going to be covering some helpful tips and tricks for using Pandas.

When working with a new dataset in Pandas there are always three things I check first:

  • What columns have null values and how many?
  • How many unique values do I have in a column (and is this what I would expect)?
  • What are the datatypes for a column? And if is there more than one datatype is this going to be a problem(this is often overlooked and can lead to unexpected errors)?

Well, these questions can be answered in no time at all with the helpful function listed below!

Let’s quickly see it in action. I’ll use the famous titanic dataset readily available on Kaggle.

import pandas as pd# loading our data
titanic_train = pd.read_csv(‘~/titantic/train.csv’)
titanic_test = pd.read_csv(‘~/titantic/test.csv’)
titanic_df = pd.concat([titanic_train, titanic_test], axis=0)
eda_helper(titanic_df)

Stay tuned for more helpful tips and tricks with Pandas!

--

--