playgrdstar
quaintitative
Published in
2 min readAug 10, 2018

--

Data Exploration in Pandas

One of Pandas’ magic powers is that it has a shitload of functions for analysing data. When we start with any numerical data, we want to take a look at its properties.

After we import the usual libraries, we need to get some data.

JPY = pd.read_csv('JPY.csv')
EUR = pd.read_csv('EUR.csv')
CNY = pd.read_csv('CNY.csv')

And then for each of them, we drop an extra column, set the DATE column as a date time object, and set the DATE column as the index.

JPY = JPY.drop(['Unnamed: 0'], axis=1)
JPY.DATE = pd.to_datetime(JPY.DATE)
JPY = JPY.set_index('DATE')

Now we can plot the data with one single simple function.

JPY.plot()

We can also combine all three of these data sets into one large Pandas data frame.

CURR = pd.concat([EUR, JPY, CNY], axis=1)
CURR.columns = (['EUR', 'JPY', 'CNY'])

Now when we plot, we will see all three data series.

And now we can get to the good stuff in the notebook here, where we show how use the functions:

  • describe — to show all the key stats, such as count, mean, min, max, percentiles
  • count, min, max, median which are pretty much self explanatory
  • var and std for the variance and standard deviation
  • skew and kurtosis for the 3rd and 4th order moments
  • corr and con for the correlation and covariance between the currencies.

--

--

playgrdstar
quaintitative

ming // gary ang // illustration, coding, writing // portfolio >> playgrd.com | writings >> quaintitative.com