Hello Everyone,

Its been a while I’m sorry, I’ve been lazy! Currently I am traveling in China & Southeast Asia. Internet at my grandma’s is not the best…and all they do is play ma jong.

Today, I am going to talk about Pandas! No, not the cuddly bears (I haven’t even seen any in China yet), but instead, the awesome python library that helps you manipulate data! I’m going to go in depth a bit on two main concepts of Series & Dataframes. And then, what I found helpful in life, is becoming fluent in indexing dataframes. Feel free to download the jupyter notebook exercises from my github I’ll go through later here. If you don’t know how to use jupyter notebooks, here is their guide on how to get started.

Pandas

So what is Pandas? Pandas is yet another library in python that is mostly built off of NumPy. It contains all the awesome features of NumPy and more. If you don’t remember what that was, you can read my earlier blogpost on it. It has been common to see pandas to be imported and aliased as such.

import pandas as pd

And really, the two things you need to know about the Pandas library are the Dataframes object & Series object.

Series

series_example = pd.Series([1,2,3,4], index=[‘d’, ‘b’, ‘a’, ‘c’])
output:
2 1
3 2
4 3
5 4
dtype: int64

Dataframes

Dataframes can be initiated in a couple of ways. However the three primary arguments you should at least remember are “data”, “columns”, “index”. Data is typically passed in as a dictionary of series. Columns are the column names or column “indexes”. And index are the row labels or row “indexes”. By default, similar to when initiating series, the row index will be incremental integers.

dataset = pd.DataFrame(data = {'age': [10,28,30], 'weight': [120,133,155],'height': [160,165,175], 'color':['blue','green','pink']}, columns = ["age","weight", "height","color"])

Manipulating Dataframes: Indexing

Original Standard Indexing

Through dict like notation

dataset["age"]

Through attribute notation

dataset.age

To retrieve certain rows you can use the row indexes

dataset[0:2]

To be fancy and only want to call certain columns & certain rows

dataset[0:2]["age"]

You can see here that I can only call the columns by its column and row index/position.

To add a condition on the rows you want you can do something like this

dataset[dataset["age"] > 15]

You can evaluate that as “dataset[“age”] > 15" returning you a list of true and falses. Then, the operation will return entire rows where it was evaluated as true.

Position Indexing: .iloc

dataset.iloc[0:3,0:2]

This gives me the first 3 rows and the first 2 columns of my dataframe. This is pretty useful when you don’t have the energy to be spelling out all your column names and convenient for looping.

Label Indexing: .loc

dataset.loc[0:2,["age","weight"]]

Beware, This actually grabs the first three rows and the first two columns of my dataframe. In this case, my row index happens to be numerical and therefore the “position” and “label” are the same. But in reality, if my row labels were something like “a,b,c”, calling “0:2” would throw an error.

Conclusion

Data Analyst by day, artist by night. Striving towards creativity and happiness everyday.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store