Pandas — DATAFRAMES
When should I use pandas DataFrame?#PySeries#Episode 31
Let’s see Pandas' DATAFRAMES again! Google collab notebook link:)
Pandas DATAFRAMES: The Primary Pandas Data Structure!
When should I use pandas DataFrame?
The Pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels.
DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields.
What Follows Example of how to use it:
Please, open your collab notebook and follow me:
01# First thing first. Importing the libraries:
import numpy as np
import pandas as pd%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
02# Let’s create a simple Graphic now:
Getting acquainted with PANDAS DATAFRAME:
x = np.linspace(0, 10, 30)y = np.sin(x)plt.plot(x, y, 'o', color='black');
3# Now here is a Real Problem:
Suppose a local ice cream shop keeps track of how many ice cream they sell versus the noon temperature on that day.Here are registers for 12 days in a roll:
4# Let’s Plot the Graph & Make a Linear Regression:
Creating The Graph’s Axis From Numpy Arrays(x & y):
from scipy import stats# Defining x & y Axis as Numpy Array
x=np.array([215,325,185,332,406,522,412,614,544,421,445,408])
y=np.array([14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2])# Linear Regression
a,b,correlation,p,error=stats.linregress(x,y)
print('Regression line: y=%.2fx+%.2f'% (a,b))
print('Correlation Coefficient: r=%.2f'% correlation)# Plotting the Graph
plt.plot(x,y,'o',label='Original data')
f=a*x+b
plt.plot(x,f,'r',label='Regression Line')
plt.ylim(10, 30)plt.legend()
plt.title("Ice Cream Sales for The Last 12 Days")
plt.xlabel('Sales')
plt.ylabel('Temp °C')
plt.show()
🍦As you can see, the temperature 🍧 boosts the sales for the ice screen 🍨
5# Now Pandas DATAFRAMES Operations:
DATAFRAMES Can be thought of as a dict-like container for Series objects.
Creating Pandas DATAFRAMES From Dictionary (X & Y):
# Creating Pandas DATAFRAME by passing Disctionary to Pandas Dataframe METHOD:d = {'X': [215,325,185,332,406,522,412,614,544,421,445,408], 'Y': [14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2]}df = pd.DataFrame(data=d)df.index = ['1°_dia','2°_dia', '3°_dia','4°_dia','5°_dia','6°_dia','7°_dia','8°_dia','9°_dia','10°_dia','11°_dia','12°_dia']df.columns = [ 'Ice_Cream_Sales', 'Temperature_°C' ]df
6# Pandas DATAFRAMES — Using Brackets Notation:
DATAFRAMES — The primary Pandas data dict-like container structure!
Getting specific Column LIKE THIS: df[‘Specific_column’]…
# DATAFRAMES Can be thought of as a dict-like container for Series objects
# Here I am passing the Column NAME as string:df['Temperature_°C']1°_dia 215
2°_dia 325
3°_dia 185
4°_dia 332
5°_dia 406
6°_dia 522
7°_dia 412
8°_dia 614
9°_dia 544
10°_dia 421
11°_dia 445
12°_dia 408 Name: Temperature_°C, dtype: int64
What type of object is it?
type(df['Temperature_°C'])pandas.core.series.Series
…or like this: df[[‘List_of_Columns’]]:
# Here I am passing the Columns' LIST:df[['Temperature_°C','Ice_Cream_Sales'] ]
# We’ve got a DATAFRAME object:type(df)pandas.core.frame.DataFrame
Returning a SERIES object:
df.Ice_Cream_Sales# Returning a SERIES objectdf.Ice_Cream_Sales1°_dia 14.2
2°_dia 16.4
3°_dia 11.9
4°_dia 15.2
5°_dia 18.5
6°_dia 22.1
7°_dia 19.4
8°_dia 25.1
9°_dia 23.4
10°_dia 18.1
11°_dia 22.6
12°_dia 17.2 Name: Ice_Cream_Sales, dtype: float64
Returning a DATAFRAME object:
df[[‘Ice_Cream_Sales’]]
7# Creating a New Column (X.Y):
Making a Multiple Operation with DATAFRAMES:
df['X.Y'] = df[ 'Temperature_°C'] * df['Ice_Cream_Sales']df
8# Dropping Columns:
When inplace = True, the data is modified in place, which means it will return nothing, and the dataframe is now updated.
When inplace = False (DEFAULT), which is the default, then the operation is performed and it returns a copy of the object. You then need to save it to something.
df.drop('X.Y', axis=1, inplace=False)
df
df.drop('X.Y', axis=1, inplace=True)
Now:
df
9# Dropping Rows:
df.drop('12°_dia', axis=0)
df.shape(12, 2)
Now with inplace attribute:
df.drop(‘12°_dia’, axis=0, inplace=True)
df
df.shape(11, 2)
10# Selecting Rows — There are two methods:
LOC -> LABEL-BASE index
ILOC -> NUMERICAL-BASE index
# Calling the 11th day (loc): LABEL-BASEdf.loc['11°_dia']Temperature_°C 445.0
Ice_Cream_Sales 22.6
Name: 11°_dia, dtype: float64
Now:
# Calling the 11th day (iloc): NUMERICAL-BASEdf.iloc[10]Temperature_°C 445.0
Ice_Cream_Sales 22.6
Name: 11°_dia, dtype: float64
11# Returning a Single Value:
# Calling the 9th day (loc): label-BASEdf.loc[['9°_dia'],['Ice_Cream_Sales']]
# Calling the Row 9th x Column 'Ice_Cream_Sales' day (iloc): Numerical-BASEdf.iloc[8,1]23.4
12 # Returning a SUB-SET of the DataFrame:
df.loc[['7°_dia','8°_dia', '9°_dia'],['Ice_Cream_Sales']]
# Saving into a variable:df2 = df.loc[['7°_dia', '9°_dia'],['Ice_Cream_Sales']]df2
# Discovering the type's Variable:type(df2)pandas.core.frame.DataFrame
13 # J3 signing-off ;):
I WISH YOU ALL THE BEST!
print("That's it! This is another example for PySeries#Episode 08")
OK! That’s all!
I hope you enjoyed that lecture.
If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.
Until next time!
I wish you an excellent day!
31_pandas_dataframe_practice.ipynb
Credits & References
Based on: Support Vector Machines: A Visual Explanation with Sample Python Code by Alice Zhao
Related Post:
08 # PySeries#Episode 08 — Pandas — DataFrames : The Primary Pandas Data Structure! It Is a Dict-Like Container for Series Object