Jungletronics
Published in

Jungletronics

Pandas — DataFrames

The Primary Pandas Data Structure! It Is a Dict-Like Container for Series Objects— #PySeries#Episode 08

Hello, let’s see Pandas AGAIN!

This time, DataFrame!

Fig 1. Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool,
built on top of the Python programming language.

Here are the topics for our study about Pandas Series:

.Series
.DataFrames (this one:)
.Missing Data
.GroupBy
.Merging, Joinning, and Cocarenating
.Operations
.Data Input and Output
Fig 2. Numpy & Pandas Together!

The second topic will be this one: DataFrames!

DATAFRAMES

The primary Pandas data structure!

Can be thought of as a dict-like container for Series objects.

import numpy as np
import pandas as pd

And for our database creation:

from numpy.random import randn

Let's seed it, so our data is the same (in case you want to follow me:)

np.random.seed(101)

How To Create a DataFrame

For the purpose of our studying, here is how:

DataFrame(Data, xLabel, yLabel):

df=pd.DataFrame(randn(5,4), ['A','B','C','D','E'], ['W','X','Y','Z' ])

Note: to work on your code you may need to retype the single quotes (´), compatible with your system;)

Now call the object:

df
Fig 3. Here is the table that can be better viewed, right?

Each of these columns and row is Series themselves!

INDEXING & SELECTION IN PANDAS

Using Brackets Notation:

Just pass in the column name, ie ‘W’:

df[‘W’]A    2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64

See what type of object df is:

type(df['W'])pandas.core.series.Series

See ‘W’ is just a Series!

And The DataFrame itself?

type(df)pandas.core.frame.DataFrame

The df itself is the DataFrame!

Using SQL Notation:

Note: not recommended, because we can confuse with the real method of df object!

So, always use the bracket Notation when it comes to rescuing series from df :)

Anyway, here you have it!

# This is SQL Notation: Not recommended :/df.WA    2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
# Use Bracket Notation [] instead :)

Getting Multiple Columns back!

Pass in a List, please!

# WHICH IS GOING TO RESULT IN ANOTHER SET OF BRACKETS HERE!df[[‘W’,’Z’]]
Fig 4. Running df[[‘W’,’Z’]] — Getting multiples columns back!

Creating a New Column

Just make some arithmetic on the right side with the series you want to create your column:

df[‘new’] = df[‘W’] + df[‘Y’]df
Fig 5. Running df[‘new’] = df[‘W’] + df[‘Y’] — Creating a new row!

Dropping Columns

Pandas requires that you specify that you really want to modify your data in place (affect the original DB);

It is like so you do not accidentally lose information;

In case you’ve done a bunch of adjustments to your data, you don’t want to accidentally lose it, right?

This is like ‘commit’ in DB!

df.drop(‘new’, axis=1, inplace=True)df
Fig 6. Running df.drop(‘new’, axis=1, inplace=True) — Dropping Columns!

Dropping Rows

This time I am not doing this in place!

Note: axis=0 is the default, so you don’t need to specify it here:)

dropped_df = df.drop(‘E’, axis=0)
Fig 7. Running df.drop(‘E’, axis=0) — Dropping without ’commit’ :) Now you can work w/ dropped_df object. If you specify inplace=True it will return no object :/

See that our DataFrame has not been affected yet by the last drop! We didn’t make it in place, remember?

# Shape returns a tuple dimension (row, column)
df.shape
(5, 4)

See, df isn’t affected yet!

df
Fig 8. Running df, rescuing the DataFrame again!

Selecting Rows

There are two methods:

  1. LOC -> label-BASE index
  2. ILOC -> numerical-BASE index

IT’S A LITTLE WEIRD HOW THE METHODS ARE CALLED IN PANDAS:

IT USES A SQUARED BRACKET!

But that’s the way it works for Pandas!

# This returns a series of that ‘A’ row!
df.loc[‘A’]
W 2.706850
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64

Or alternatively, type the index of the row required!

# This is a numerical-BASE index locator = iloc
df.iloc[0]
W 2.706850
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64

Returning a Single Value

# INDEXING
df.loc[‘B’, ‘Y’]
-0.8480769834036315

Returning the same as previous, just locating it.

# Grab the element on the second row (‘B’) 
# and in the third column (‘Y’), right?
df.iloc[1,2]-0.8480769834036315

Returning a SUB-SET of the DataFrame

Just pass two lists of the rows and columns you want!

# Please, get used to the SQUARED BRACKET :/df.loc[[‘A’, ‘B’],[‘W’, ‘Y’]]WYA2.7068500.907969B0.651118–0.848077
Fig 9. Running df.loc[[‘A’, ‘B’],[‘W’, ‘Y’]] — Creating a data sub-set!

And that’s it!

print(“Ok, we’re going to stop here for now and continue the discussion in the next PySeries Episode!” )

Ok, we’re going to stop here for now and continue the discussion in the next PySeries Episode!

# https://medium.com/jungletronics/pandas-dataframes-7ba872dcbc30
print(‘Thank You for reading This post!. Bye!’)

Thank You for reading this post! Bye!

We’re gonna be alright. Live From home!

The code bundle for this episode is available at:

GitHub Repo link

Colab Link

Credits & References:

Jose Portilla — Python for Data Science and Machine Learning Bootcamp — Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

Posts Related:

00Episode#PySeries — Python — Jupiter Notebook Quick Start with VSCode — How to Set your Win10 Environment to use Jupiter Notebook

01Episode#PySeries — Python — Python 4 Engineers — Exercises! An overview of the Opportunities Offered by Python in Engineering!

02Episode#PySeries — Python — Geogebra Plus Linear Programming- We’ll Create a Geogebra program to help us with our linear programming

03Episode#PySeries — Python — Python 4 Engineers — More Exercises! — Another Round to Make Sure that Python is Really Amazing!

04Episode#PySeries — Python — Linear Regressions — The Basics — How to Understand Linear Regression Once and For All!

05Episode#PySeries — Python — NumPy Init & Python Review — A Crash Python Review & Initialization at Numpy lib.

06Episode#PySeries — Python — NumPy Arrays & Jupyter Notebook — Arithmetic Operations, Indexing & Selection, and Conditional Selection

07Episode#PySeries — Python — Pandas — Intro & Series — What it is? How to use it?

08Episode#PySeries — Python —Pandas DataFrames — The primary Pandas data structure! It is a dict-like container for Series objects (this one)

09Episode#PySeries — Python — Python 4 Engineers — Even More Exercises! — More Practicing Coding Questions in Python!

10Episode#PySeries — Python — Pandas — Hierarchical Index & Cross-section — Open your Colab notebook and here are the follow-up exercises!

11Episode#PySeries — Python — Pandas — Missing Data — Let’s Continue the Python Exercises — Filling & Dropping Missing Data

12Episode#PySeries — Python — Pandas — Group By — Grouping large amounts of data and compute operations on these groups

13Episode#PySeries — Python — Pandas — Merging, Joining & Concatenations — Facilities For Easily Combining Together Series or DataFrame

14Episode#PySeries — Python — Pandas — Pandas Dataframe Examples: Column Operations

15Episode#PySeries — Python — Python 4 Engineers — Keeping It In The Short-Term Memory — Test Yourself! Coding in Python, Again!

16Episode#PySeries — NumPy — NumPy Review, Again;) — Python Review Free Exercises

17Episode#PySeriesGenerators in Python — Python Review Free Hints

18Episode#PySeries — Pandas Review…Again;) — Python Review Free Exercise

19Episode#PySeriesMatlibPlot & Seaborn Python Libs — Reviewing theses Plotting & Statistics Packs

20Episode#PySeriesSeaborn Python Review — Reviewing theses Plotting & Statistics Packs

31 Episode#PySeries — Pandas — DATAFRAMES — When should I use pandas DataFrame?#PySeries#Episode 31

--

--

--

J of Jungle + 3 Plats Arduino/RPi/Pic = J3

Recommended from Medium

Are You About to Bump into a Data Iceberg?

Big Data And ML: A Marriage Between Giants!

User Experience and Customer Journey Analysis of a German cosmetic and medicine company

Real Time Data Analytics — Discover Business Insights faster

Recommendation System: Collaborative Filtering (Part 2)

Day-44 Principal Component Analysis-1

Unsimply Model Decay

Alternative maps, replacing google, work fine offline and m GPS. (Download in advance!)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
J3

J3

Hi, Guys o/ I am J3! I am just a hobby-dev, playing around with Python, Django, Lego, Arduino, Raspy, PIC, AI… Welcome! Join us!

More from Medium

Geometric Mean using Pandas in Python

Amazing Image Processing in Python in 5 Lines of Code [Part-1]

Python os.path.join and string format placeholder

Numpy Pandas in Python from Scratch