Data Exploration in Pandas

Published in

quaintitative

2 min readAug 10, 2018

Data Exploration in Pandas

One of Pandas’ magic powers is that it has a shitload of functions for analysing data. When we start with any numerical data, we want to take a look at its properties.

After we import the usual libraries, we need to get some data.

JPY = pd.read_csv('JPY.csv')
EUR = pd.read_csv('EUR.csv')
CNY = pd.read_csv('CNY.csv')

And then for each of them, we drop an extra column, set the DATE column as a date time object, and set the DATE column as the index.

JPY = JPY.drop(['Unnamed: 0'], axis=1)
JPY.DATE = pd.to_datetime(JPY.DATE)
JPY = JPY.set_index('DATE')

Now we can plot the data with one single simple function.

JPY.plot()

We can also combine all three of these data sets into one large Pandas data frame.

CURR = pd.concat([EUR, JPY, CNY], axis=1)
CURR.columns = (['EUR', 'JPY', 'CNY'])

Now when we plot, we will see all three data series.

And now we can get to the good stuff in the notebook here, where we show how use the functions:

describe — to show all the key stats, such as count, mean, min, max, percentiles
count, min, max, median which are pretty much self explanatory
var and std for the variance and standard deviation
skew and kurtosis for the 3rd and 4th order moments
corr and con for the correlation and covariance between the currencies.

Data Exploration in Pandas

Written by playgrdstar