Pandas — Intro & Series
What it is? How to use it? — #PySeries#Episode 07
What is PANDAS?
- Pandas is an open-source library build on top of NumPy;
- Pandas allows for analysis and data cleaning and preparations;
- Pandas excels in performance and productivity;
- Pandas also has built-in visualization features;
- Pandas can work with data from a wide variety of sources.
Data Science topics with Pandas:
Here are the topics for our study about Pandas:
.Series (this one:)
.DataFrames
.Missing Data
.GroupBy
.Merging, Joinning, and Concatenating
.Operations
.Data Input and Output
The first topic will be the Series:
SERIES
We will need these four object in Python to open a Series in Pandas:
LIST
DATA
ARRAY
DICTIONARY
Get your Jupyter Notebook (or Google Colab) and type:
# Pandas is build on top of NumPy
import numpy as np
Pandas needs NumPy because it is built on top of it.
Now initialize Pandas itself like this:
# To start up PANDAS package
import pandas as pd
Let`s get down to code; Here is our List:
labels=['a','b','c']
And our other three separate Python Object: data, NumPy array, and a dictionary:
my_data=[10,20,30]
arr=np.array(my_data)
d={'a':10,'b':20, 'c':30}
HOW TO CREATE A SERIES
First, we’ll need theses four Python Object:
- LIST
- DATA
- ARRAY and
- DICTIONARY
Then we pass in our DATA to the Series Method:
# Series : It look a lot as np array, except that here
# it’s very distinguish we have an INDEX and the actual DATA,
# that is, it is an indexed array:)pd.Series(data=my_data)0 10
1 20
2 30
dtype: int64
Or our DATA together w/ LIST (labels):
# Now I specify that INDEX is iguals to the LIST labels
# Now I have a label-INDEX Seriespd.Series(data=my_data, index=labels)a 10
b 20
c 30
dtype: int64
In this very order: 1º DATA then 2º INDEX: Series (DATA, INDEX)
# You don’t need the constant-specifier
# as long as you put them in the correct orderpd.Series(my_data, labels)a 10
b 20
c 30
dtype: int64
Or finally, what’s really cool:
Pass in the DICTIONARY:
# Pandas takes the key as an INDEX, and the dictionary
# values as our DATA.
# So that’s a nice and fast way to quickly create a SERIES!pd.Series(d)a 10
b 20
c 30
dtype: int64
A Series can hold pretty much almost any type of data object of Python as its data points and, more interesting than that is we can pass in built-in functions like sum(), print(), and len(), etc
It can even hold references of these functions as data points :)
pd.Series(data=[sum, print, len])
We probably never actually use this, but this demonstrates the flexibility of the PANDAS Series as far as holding different object types!
(Jose Portilla — Python For Data Science course)
Arithmetic with Series
Series (DATA, INDEX)
ser1=pd.Series([1,2,3,4], ['USA', 'USSR', 'GERMANY', 'JAPAN'])ser1USA 1
USSR 2
GERMANY 3
JAPAN 4
dtype: int64
Other Series follows:
ser2=pd.Series([1,2,5,4], ['USA','USSR', 'ITALY', 'JAPAN'])ser2USA 1
USSR 2
ITALY 5
JAPAN 4
dtype: int64
How to Recovery the Series:
# To rescue the data pass in the INDEXser1['USA']1
Or pass in the INDEX like this:
ser1.USA1
Arithmetic
# We can do arithmetic operations with the Series too:ser1 + ser2GERMANY NaN
ITALY NaN
JAPAN 8.0
USA 2.0
USSR 4.0
dtype: float64
That’s it for Pandas Series!
In the next episode let’s discuss DataFrame!
Stay tuned!
Bye, for now, o/
GitHub Repo link
Google Colab link [TODO: THE LINK FOR COLAB GOES HERE!]
Credits & References:
Jose Portilla — Python for Data Science and Machine Learning Bootcamp — Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!
Posts Related:
00Episode#PySeries — Python — Jupiter Notebook Quick Start with VSCode — How to Set your Win10 Environment to use Jupiter Notebook
01Episode#PySeries — Python — Python 4 Engineers — Exercises! An overview of the Opportunities Offered by Python in Engineering!
02Episode#PySeries — Python — Geogebra Plus Linear Programming- We’ll Create a Geogebra program to help us with our linear programming
03Episode#PySeries — Python — Python 4 Engineers — More Exercises! — Another Round to Make Sure that Python is Really Amazing!
04Episode#PySeries — Python — Linear Regressions — The Basics — How to Understand Linear Regression Once and For All!
05Episode#PySeries — Python — NumPy Init & Python Review — A Crash Python Review & Initialization at Numpy lib.
06Episode#PySeries — Python — NumPy Arrays & Jupyter Notebook — Arithmetic Operations, Indexing & Selection, and Conditional Selection
07Episode#PySeries — Python — Pandas — Intro & Series — What it is? How to use it? (this one)
08Episode#PySeries — Python — Pandas DataFrames — The primary Pandas data structure! It is a dict-like container for Series objects
09Episode#PySeries — Python — Python 4 Engineers — Even More Exercises! — More Practicing Coding Questions in Python!
10Episode#PySeries — Python — Pandas — Hierarchical Index & Cross-section — Open your Colab notebook and here are the follow-up exercises!
11Episode#PySeries — Python — Pandas — Missing Data — Let’s Continue the Python Exercises — Filling & Dropping Missing Data
12Episode#PySeries — Python — Pandas — Group By — Grouping large amounts of data and compute operations on these groups
13Episode#PySeries — Python — Pandas — Merging, Joining & Concatenations — Facilities For Easily Combining Together Series or DataFrame
14Episode#PySeries — Python — Pandas — Pandas Dataframe Examples: Column Operations
15Episode#PySeries — Python — Python 4 Engineers — Keeping It In The Short-Term Memory — Test Yourself! Coding in Python, Again!
16Episode#PySeries — NumPy — NumPy Review, Again;) — Python Review Free Exercises
17Episode#PySeries — Generators in Python — Python Review Free Hints
18Episode#PySeries — Pandas Review…Again;) — Python Review Free Exercise
19Episode#PySeries — MatlibPlot & Seaborn Python Libs — Reviewing theses Plotting & Statistics Packs
20Episode#PySeries — Seaborn Python Review — Reviewing theses Plotting & Statistics Packs