# Pandas 101: Indexing

Hello Everyone,

Its been a while I’m sorry, I’ve been lazy! Currently I am traveling in China & Southeast Asia. Internet at my grandma’s is not the best…and all they do is play ma jong.

Today, I am going to talk about Pandas! No, not the cuddly bears (I haven’t even seen any in China yet), but instead, the awesome python library that helps you manipulate data! I’m going to go in depth a bit on two main concepts of Series & Dataframes. And then, what I found helpful in life, is becoming fluent in indexing dataframes. Feel free to download the jupyter notebook exercises from my github I’ll go through later here. If you don’t know how to use jupyter notebooks, here is their guide on how to get started.

# Pandas

So what is Pandas? Pandas is yet another library in python that is mostly built off of NumPy. It contains all the awesome features of NumPy and more. If you don’t remember what that was, you can read my earlier blogpost on it. It has been common to see pandas to be imported and aliased as such.

`import pandas as pd`

And really, the two things you need to know about the Pandas library are the Dataframes object & Series object.

# Series

`series_example = pd.Series([1,2,3,4], index=[‘d’, ‘b’, ‘a’, ‘c’])output:2    13    24    35    4dtype: int64`

# Dataframes

Dataframes can be initiated in a couple of ways. However the three primary arguments you should at least remember are “data”, “columns”, “index”. Data is typically passed in as a dictionary of series. Columns are the column names or column “indexes”. And index are the row labels or row “indexes”. By default, similar to when initiating series, the row index will be incremental integers.

`dataset = pd.DataFrame(data = {'age': [10,28,30], 'weight': [120,133,155],'height': [160,165,175], 'color':['blue','green','pink']}, columns = ["age","weight", "height","color"])`

# Manipulating Dataframes: Indexing

## Original Standard Indexing

Through dict like notation

`dataset["age"]`

Through attribute notation

`dataset.age`

To retrieve certain rows you can use the row indexes

`dataset[0:2]`

To be fancy and only want to call certain columns & certain rows

`dataset[0:2]["age"]`

You can see here that I can only call the columns by its column and row index/position.

To add a condition on the rows you want you can do something like this

`dataset[dataset["age"] > 15]`

You can evaluate that as “dataset[“age”] > 15" returning you a list of true and falses. Then, the operation will return entire rows where it was evaluated as true.

## Position Indexing: .iloc

`dataset.iloc[0:3,0:2]`

This gives me the first 3 rows and the first 2 columns of my dataframe. This is pretty useful when you don’t have the energy to be spelling out all your column names and convenient for looping.

## Label Indexing: .loc

`dataset.loc[0:2,["age","weight"]]`

Beware, This actually grabs the first three rows and the first two columns of my dataframe. In this case, my row index happens to be numerical and therefore the “position” and “label” are the same. But in reality, if my row labels were something like “a,b,c”, calling “0:2” would throw an error.

# Conclusion

Data Analyst by day, artist by night. Striving towards creativity and happiness everyday.

## More from Shirley Liu

Data Analyst by day, artist by night. Striving towards creativity and happiness everyday.

## Data Analysis on Covid-19 in The World

Get the Medium app