Pandas — DataFrames

The Primary Pandas Data Structure! It Is a Dict-Like Container for Series Objects— #PySeries#Episode 08

Published in

Jungletronics

6 min readSep 9, 2020

Hello, let’s see Pandas AGAIN!

This time, DataFrame!

Fig 1. **Pandas** is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool,
built on top of the Python programming language.

Here are the topics for our study about Pandas Series:

.Series
.DataFrames (this one:)
.Missing Data
.GroupBy
.Merging, Joinning, and Cocarenating
.Operations
.Data Input and Output

The second topic will be this one: DataFrames!

DATAFRAMES

The primary Pandas data structure!

Can be thought of as a dict-like container for Series objects.

import numpy as np
import pandas as pd

And for our database creation:

from numpy.random import randn

Let's seed it, so our data is the same (in case you want to follow me:)

np.random.seed(101)

How To Create a DataFrame

For the purpose of our studying, here is how:

DataFrame(Data, xLabel, yLabel):

df=pd.DataFrame(randn(5,4), ['A','B','C','D','E'], ['W','X','Y','Z' ])

Note: to work on your code you may need to retype the single quotes (´), compatible with your system;)

Now call the object:

df

Fig 3. Here is the table that can be better viewed, right?

Each of these columns and row is Series themselves!

INDEXING & SELECTION IN PANDAS

Using Brackets Notation:

Just pass in the column name, ie ‘W’:

df[‘W’]A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

See what type of object df is:

type(df['W'])pandas.core.series.Series

See ‘W’ is just a Series!

And The DataFrame itself?

type(df)pandas.core.frame.DataFrame

The df itself is the DataFrame!

Using SQL Notation:

Note: not recommended, because we can confuse with the real method of df object!

So, always use the bracket Notation when it comes to rescuing series from df :)

Anyway, here you have it!

# This is SQL Notation: Not recommended :/df.WA    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64# Use Bracket Notation [] instead :)

Getting Multiple Columns back!

Pass in a List, please!

# WHICH IS GOING TO RESULT IN ANOTHER SET OF BRACKETS HERE!df[[‘W’,’Z’]]

Fig 4. Running **df[[‘W’,’Z’]]** — Getting multiples columns back!

Creating a New Column

Just make some arithmetic on the right side with the series you want to create your column:

df[‘new’] = df[‘W’] + df[‘Y’]df

Fig 5. Running **df[‘new’] = df[‘W’] + df[‘Y’]** — Creating a new row!

Dropping Columns

Pandas requires that you specify that you really want to modify your data in place (affect the original DB);

It is like so you do not accidentally lose information;

In case you’ve done a bunch of adjustments to your data, you don’t want to accidentally lose it, right?

This is like ‘commit’ in DB!

df.drop(‘new’, axis=1, inplace=True)df

Fig 6. Running **df.drop(‘new’, axis=1, inplace=True)** — Dropping Columns!

Dropping Rows

This time I am not doing this in place!

Note: axis=0 is the default, so you don’t need to specify it here:)

dropped_df = df.drop(‘E’, axis=0)

Fig 7. Running **df.drop(‘E’, axis=0)** — Dropping without ’commit’ :) Now you can work w/ dropped_df object. If you specify **inplace=True** it will return no object :/

See that our DataFrame has not been affected yet by the last drop! We didn’t make it in place, remember?

# Shape returns a tuple dimension (row, column)
df.shape(5, 4)

See, df isn’t affected yet!

df

Fig 8. Running df, rescuing the DataFrame again!

Selecting Rows

There are two methods:

LOC -> label-BASE index
ILOC -> numerical-BASE index

IT’S A LITTLE WEIRD HOW THE METHODS ARE CALLED IN PANDAS:

IT USES A SQUARED BRACKET!

But that’s the way it works for Pandas!

# This returns a series of that ‘A’ row!
df.loc[‘A’]W    2.706850
X    0.628133
Y    0.907969
Z    0.503826
Name: A, dtype: float64

Or alternatively, type the index of the row required!

# This is a numerical-BASE index locator = iloc
df.iloc[0]W    2.706850
X    0.628133
Y    0.907969
Z    0.503826
Name: A, dtype: float64

Returning a Single Value

# INDEXING
df.loc[‘B’, ‘Y’]-0.8480769834036315

Returning the same as previous, just locating it.

# Grab the element on the second row (‘B’) 
# and in the third column (‘Y’), right?df.iloc[1,2]-0.8480769834036315

Returning a SUB-SET of the DataFrame

Just pass two lists of the rows and columns you want!

# Please, get used to the SQUARED BRACKET :/df.loc[[‘A’, ‘B’],[‘W’, ‘Y’]]WYA2.7068500.907969B0.651118–0.848077

Fig 9. Running **df.loc[[‘A’, ‘B’],[‘W’, ‘Y’]]** — Creating a data sub-set!

And that’s it!

print(“Ok, we’re going to stop here for now and continue the discussion in the next PySeries Episode!” )

Ok, we’re going to stop here for now and continue the discussion in the next PySeries Episode!

# https://medium.com/jungletronics/pandas-dataframes-7ba872dcbc30
print(‘Thank You for reading This post!. Bye!’)

Thank You for reading this post! Bye!

We’re gonna be alright. Live From home!

The code bundle for this episode is available at:

GitHub Repo link

Colab Link

Credits & References:

pandas.DataFrame - pandas 0.23.4 documentation

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)…

pandas.pydata.org

Jose Portilla — Python for Data Science and Machine Learning Bootcamp — Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

01Episode#PySeries — Python — Python 4 Engineers — Exercises! An overview of the Opportunities Offered by Python in Engineering!

02Episode#PySeries — Python — Geogebra Plus Linear Programming- We’ll Create a Geogebra program to help us with our linear programming

03Episode#PySeries — Python — Python 4 Engineers — More Exercises! — Another Round to Make Sure that Python is Really Amazing!

04Episode#PySeries — Python — Linear Regressions — The Basics — How to Understand Linear Regression Once and For All!

05Episode#PySeries — Python — NumPy Init & Python Review — A Crash Python Review & Initialization at Numpy lib.

06Episode#PySeries — Python — NumPy Arrays & Jupyter Notebook — Arithmetic Operations, Indexing & Selection, and Conditional Selection

07Episode#PySeries — Python — Pandas — Intro & Series — What it is? How to use it?

08Episode#PySeries — Python —Pandas DataFrames — The primary Pandas data structure! It is a dict-like container for Series objects (this one)

09Episode#PySeries — Python — Python 4 Engineers — Even More Exercises! — More Practicing Coding Questions in Python!

10Episode#PySeries — Python — Pandas — Hierarchical Index & Cross-section — Open your Colab notebook and here are the follow-up exercises!

11Episode#PySeries — Python — Pandas — Missing Data — Let’s Continue the Python Exercises — Filling & Dropping Missing Data

12Episode#PySeries — Python — Pandas — Group By — Grouping large amounts of data and compute operations on these groups

13Episode#PySeries — Python — Pandas — Merging, Joining & Concatenations — Facilities For Easily Combining Together Series or DataFrame

14Episode#PySeries — Python — Pandas — Pandas Dataframe Examples: Column Operations

15Episode#PySeries — Python — Python 4 Engineers — Keeping It In The Short-Term Memory — Test Yourself! Coding in Python, Again!

16Episode#PySeries — NumPy — NumPy Review, Again;) — Python Review Free Exercises

17Episode#PySeries — Generators in Python — Python Review Free Hints

18Episode#PySeries — Pandas Review…Again;) — Python Review Free Exercise

19Episode#PySeries — MatlibPlot & Seaborn Python Libs — Reviewing theses Plotting & Statistics Packs

20Episode#PySeries — Seaborn Python Review — Reviewing theses Plotting & Statistics Packs

31 Episode#PySeries — Pandas — DATAFRAMES — When should I use pandas DataFrame?#PySeries#Episode 31