Pandas DataFrame Advanced

A Complete Notebook Review #PurePythonSeries — Episode #04

Published in

Jungletronics

7 min readSep 16, 2021

INDEX

(1) Series

(2) DataFrame

(3) Conditional Selection

(4) Set and Reset Index

(5) Missing Data

(6) Group By

(7) Cross Section

(8) Pivot Table

(9) Merging, Joining and Concatenation

1 — PANDAS Series — How to Create From…

import pandas as pd

Let’s create four artifacts to see how to create a Series:

First, a List:

labels = ['a','b', 'c']

Second, another List:

my_data = [10,20,30]

Third, a Numpy array:

import numpy as nparr = np.array(my_data)

And Finally, forth, a Dictionary:

dict = {'a':10, 'b':20, 'c':30}

Create Series FROM LISTS

pd.Series(data=my_data, index=labels)a    10
b    20
c    30
dtype: int64

Or, ordering, first data, then labels:

pd.Series(my_data, labels)a    10
b    20
c    30
dtype: int64

Create Series FROM ARRAY

pd.Series(arr)0    10
1    20
2    30
dtype: int32

Or, setting labels:

pd.Series(arr, labels)a    10
b    20
c    30
dtype: int32

Create Series FROM DICTIONARY

This is the easiest and most intuitive way…

pd.Series(dict)a    10
b    20
c    30
dtype: int64

Isn’t it?

2 — PANDAS DATAFRAME — Structure

import matplotlib.pyplot as plt%matplotlib inlineimport pandas as pddf = pd.DataFrame(data, index, columns)df = pd.DataFrame(d   , i    , c)

Create a Pandas’ Dataframe by passing a dictionary:

# Ice Cream Sale - Last 12 Sunny Days
# visit this guide for more info: table styled = {'X': [215,325,185,332,406,522,412,614,544,421,445,408], 'Y': [14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2]}df = pd.DataFrame(data=d)df.index = ['1°_dia','2°_dia', '3°_dia','4°_dia','5°_dia','6°_dia','7°_dia','8°_dia','9°_dia','10°_dia','11°_dia','12°_dia']df.columns = [  'Ice_Cream_Sales',  'Temperature_°C']df

3 — Conditional Selection w/ DataFrame

Simple Conditional

df[df>20]

df[df['Temperature_°C']> 20]

Return a subset of DataFrame

There are 2 general formats for GRABBING ELEMENTS from a DataFrame:DOUBLE BRACKET FORMAT
df[line][column]SINGLE BRACKET FORMAT
df[line,column](w/ comma, this is least prone to error!)

Multiple Lines Solution

boolser = df['Temperature_°C']>23 
result = df[boolser]
mycols = ['Temperature_°C', 'Ice_Cream_Sales']
result[mycols]

Single Line (All-in-One)

One_line Conditional Solution:

# Slicing w/ Double Bracket Format# df[][]# df[inequality][[pass_a_list_of_columns]]# df[df[]>0][['','']]df[df['Temperature_°C']>23][['Temperature_°C', 'Ice_Cream_Sales']]

Multiple Conditional Solution:

Python Use AND OR In Multiple Single Conditional

True and TrueTrue

Again:

True and FalseFalse

Once more:

True or TrueTrue

Or:

True or FalseTrue

But here, this Operation will return an ERROR :/

df[(df['Temperature_°C']>23) and (df['Temperature_°C']<26)]

Instead, use AMPERSAND (&) or PIPE( | ) as below:

df[(df['Temperature_°C']>23)  &  (df['Temperature_°C']<26)]

df[(df['Temperature_°C']<14)  |  (df['Temperature_°C']>22)]

4 — SET AND RESET INDEX

Transform the old index into a column:

resetted_df = df.reset_index()resetted_df

Return index column as real index:

resetted_df.set_index('index')

5 — MISSING DATA

Creating DataFrame From Dictionary:

dict = {'A':[1,2,np.nan],'B':[5, np.nan, np.nan],'C':[1,2,3]}

Creating a DataFrame from dictionary:

df = pd.DataFrame(dict)
df

Note that the First Line (Zero) and the Last Column (C) are Complete!

axis=0 is the default; Returning the First Line ( Zero ):

df.dropna()

Returning the Last Column ( C ):

df.dropna(axis=1)

Returning if the Column has at least 2 elements:

df.dropna(axis=1, thresh=2)

Filling the blank values:

df.fillna(value ='FILL_VALUE')

Filling the blank cells with the MEAN OF COLUMN ‘A’:

df.fillna(value=df['A'].mean())

Letting Pandas sort out how best to fill the table (ffill — fluid fill:)

df.ffill()

6 — GROUP BY

company = ['GOOG', 'GOOG', 'MSTF', 'MSTF', 'FB', 'FB']

More list:

person = ['Sam', 'Charlie', 'Amy', 'Vanessa', 'Carl', 'Sarah']

Once more list:

sales = [200,120,340,124,245,350]

Save an empty DataFrame:

df = pd.DataFrame()

Join it all in the newly created DataFrame:

df['company'] = company
df['person'] = person
df['sales'] = salesdf

Run groupby():

bycompany = df.groupby('company')

Here is the aggregate functions:

bycompany.sum()

bycompany.std()

bycompany.count()

bycompany.max()

bycompany.min()

bycompany.describe()

bycompany.describe().transpose()

7 — Multilevel DataFrame — CROSS SECTION

Let’s create a multilevel DataFrame:

outside = ['G1', 'G1', 'G1', 'G2', 'G2', 'G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside, inside))hier_index = pd.MultiIndex.from_tuples(hier_index)

Create a Multilevel DataFrame:

df = pd.DataFrame(np.random.randn(6,2), hier_index, ['A', 'B'])df

Separating Group 1:

df.loc['G1']

Or Group 2:

df.loc['G2']

In Group 2, return row one:

df.loc['G2'].loc[1]A   -1.000487
B   -2.259334
Name: 1, dtype: float64

Or specific value:

df.loc['G2'].loc[1][0]-1.0004870600502171

Now the Index names:

df.index.names

It’s blank…

FrozenList([None, None])

This is just a Pandas index names object types. All it’s saying here is that these indices don’t have any names but you can go ahead and do say equals and then pass in a list of names.

Let’s name each index:

df.index.names = ['Groups', 'Num']df

Cross Section (xs):

df.xs(1, level='Num')

df.xs(3, level='Num')

8 — PIVOT TABLE

A = ["G2", "G2", "G2", "G2", "G2", "G1", "G1", "G1", "G1"]
B = [1, 1, 1, 2, 2, 1, 1, 2, 2]
C = [2, 1, 1, 2, 2, 1, 2, 2, 1]
D = [1, 2, 2, 1, 2, 1, 2, 1, 1]
E = [2, 1, 2, 2, 1, 1, 2, 2, 1]

Create an empty DataFrame:

df = pd.DataFrame()

Populating it:

df['A'] = A
df['B'] = B
df['C'] = C
df['D'] = D
df['E'] = Edf

Here is the pivot table:

df.pivot_table(values='D', index=['A', 'B'], columns=['C'], fill_value=0 )

This example aggregates values by taking the sum:

table = pd.pivot_table(df, values='D', index=['A', 'B'],columns=['C'], aggfunc=np.sum, fill_value=0

Calling the table object:

table

9 — MERGING, JOINING, and CONCATENATION

2 DataFrames: left and right:

left = pd.DataFrame({'key':['K0','K1','K2','K3'], 'A':['A0','A1','A2','A3'], 'B':['B0','B1','B2','B3'], })left

right = pd.DataFrame({'key':['K0','K1','K2','K3'], 'A':['A4','A5','A6','A7'], 'B':['B4','B5','B6','B7'], })right

Merging

pd.merge(left,right,how='inner',on='key')

Joining

left.join(right, how='inner', lsuffix='_left', rsuffix='_right')

Concatenation

pd.concat([left, right])

pd.concat([left, right], axis=1)

print("I hope you enjoyed that lecture o/\nIf you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.\nHave a Nice Day!")

I hope you enjoyed that lecture o/
If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.
Have a Nice Day!

Until next time!

👉Jupiter notebook link :)

👉or collab link

👉git

Credits & References

Jose Portilla — Python for Data Science and Machine Learning Bootcamp — Learn how to use NumPy, Pandas, Seaborn, Matplotlib, Plotly, Scikit-Learn, Machine Learning, Tensorflow, and more!

00#Episode#PurePythonSeries — Lambda in Python — Python Lambda Desmistification

01#Episode#PurePythonSeries — Send Email in Python — Using Jupyter Notebook — How To Send Gmail In Python

02#Episode#PurePythonSeries — Automate Your Email With Python & Outlook — How To Create An Email Trigger System in Python

03#Episode#PurePythonSeries — Manipulating Files With Python — Manage Your Lovely Photos With Python!

04#Episode#PurePythonSeries — Pandas DataFrame Advanced — A Complete Notebook Review (this one)

05#Episode#PurePythonSeries — Is This Leap Year? Python Calendar — How To Calculate If The Year Is Leap Year and How Many Days Are In The Month

06#Episode#PurePythonSeries — List Comprehension In Python — Locked-in Secrets About List Comprehension

07#Episode#PurePythonSeries — Graphs — In Python — Extremely Simple Algorithms in Python

08#Episode#PurePythonSeries — Decorator in Python — How To Simplifying Your Code And Boost Your Function

Pandas DataFrame Advanced

A Complete Notebook Review #PurePythonSeries — Episode #04

INDEX

1 — PANDAS Series — How to Create From…

Create Series FROM LISTS

Create Series FROM ARRAY

Create Series FROM DICTIONARY

2 — PANDAS DATAFRAME — Structure

3 — Conditional Selection w/ DataFrame

Simple Conditional

Return a subset of DataFrame

Multiple Lines Solution

Single Line (All-in-One)

One_line Conditional Solution:

Multiple Conditional Solution:

4 — SET AND RESET INDEX

5 — MISSING DATA

Note that the First Line (Zero) and the Last Column (C) are Complete!

6 — GROUP BY

7 — Multilevel DataFrame — CROSS SECTION

8 — PIVOT TABLE

9 — MERGING, JOINING, and CONCATENATION

Merging

Joining

Concatenation

I hope you enjoyed that lecture o/
If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.
Have a Nice Day!

Credits & References

Related Posts

Written by J3

Pandas DataFrame Advanced

A Complete Notebook Review #PurePythonSeries — Episode #04

INDEX

1 — PANDAS Series — How to Create From…

Create Series FROM LISTS

Create Series FROM ARRAY

Create Series FROM DICTIONARY

2 — PANDAS DATAFRAME — Structure

3 — Conditional Selection w/ DataFrame

Simple Conditional

Return a subset of DataFrame

Multiple Lines Solution

Single Line (All-in-One)

One_line Conditional Solution:

Multiple Conditional Solution:

4 — SET AND RESET INDEX

5 — MISSING DATA

Note that the First Line (Zero) and the Last Column (C) are Complete!

6 — GROUP BY

7 — Multilevel DataFrame — CROSS SECTION

8 — PIVOT TABLE

9 — MERGING, JOINING, and CONCATENATION

Merging

Joining

Concatenation

I hope you enjoyed that lecture o/If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.Have a Nice Day!

Credits & References

Related Posts

Written by J3

I hope you enjoyed that lecture o/
If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.
Have a Nice Day!