Pandas Series — A Beginner’s Guide!
Hello! Welcome to Pandas Series — A Beginner’s Guide. In this session, we will be covering the following topics along with examples for better understanding:
Topics to be discussed:
- Introduction to pandas: Definition of Pandas library, data structures. Import convention.
- Pandas Series: Introduction, creation using different ways, operations, indexing & slicing.
- Exercises to Practice
Before diving into the topic! I would like to request the readers to visit my previous articles on NumPy as it covers the fundamental concepts that will serve as a solid foundation for the content discussed in this tutorial. You can access those articles here:
Additionally, please “Follow” me on Medium as it would encourage me to write more useful content on Data Science and Machine Learning.
Let’s dive into the topic!
Pandas is a popular open-source library built on top of NumPy. It is mainly used for fast analysis and data cleaning and data manipulation. It also has some built-in visualization features.
To learn more about Pandas, you can visit here.
The two main data structures provided by Pandas are Series & DataFrame:
Series is a one-dimensional labeled array, which kinda looks similar to the powerful version of Python list.
DataFrame is two-dimensional labeled data with rows & columns. It is just like a spreadsheet or SQL table.
In this explanation (Part 1), let’s focus on the Series
Series:
A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.). It is similar to a Python list or NumPy array, but with some additional features that is, it has indices for each element, allowing for more convenient & powerful data manipulation.
Creating a Series:
To create a series in Pandas, you can use ‘pd.Series()’ function.
- Creating a Series with “default index”:
import pandas as pd
# Create a Series with Default Index:
ser1 = pd.Series([1, 2, 3, 4, 5, 6])
print(ser1)
Output:
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
Here we created a Series containing integers from 1 to 6. By default, the indices are auto-generated as integers starting from 0 & increasing by one for each element.
2. Creating a Series with a “custom index”:
# Create a Series with Custom Index:
ser2 = pd.Series([10, 20, 30, 40, 50], index = [1, 2, 3, 4, 5])
print(ser2)
Output:
1 10
2 20
3 30
4 40
5 50
dtype: int64
The second Series ‘ser2’ contains integers from 10 to 50 with custom index (1 to 5).
3. Creating a Series from a “dictionary”:
ser3 = pd.Series({'a': 10, 'b': 15, 'c': 20, 'd': 25, 'e': 30})
print(ser3)
Output:
a 10
b 15
c 20
d 25
e 30
dtype: int64
The third Series ‘ser3’ is created from a dictionary with ‘keys’ as ‘index’ & ‘values’ as ‘data’.
4. Creating a Series with an array:
import numpy as np
import pandas as pd
arr = np.array([11, 12, 15, 21, 25, 68])
ser4 = pd.Series(arr)
print(ser4)
Output:
0 11
1 12
2 15
3 21
4 25
5 68
dtype: int64
Accessing elements of a Series:
We can access elements using custom indices or default integer indices.
# Accessing element by Default index:
print(ser1[0]) # Output: 1
# Accessing element by Custom Index:
print(ser2[2]) # Output: 20
The real power of Pandas Series lies in its ability to perform various mathematical operations. Let’s have a look at those:
Operations on Series:
ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])
res = ser5 + 5
print(res)
Output:
A 15
B 25
C 35
D 45
E 55
dtype: int64
Example — 2:
ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])
res = np.sqrt(ser5)
print(res)
Output:
A 3.162278
B 4.472136
C 5.477226
D 6.324555
E 7.071068
dtype: float64
Filtering in Series:
You can also filter elements based on Series.
ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])
res = ser5[ser5 > 32]
print(res)
Output:
D 40
E 50
dtype: int64
Handling Missing Data:
Pandas provide excellent features to handle missing data. The missing data is represented as ‘NaN’ (Not a Number)
ser6 = pd.Series([21, 22, None, 100, 150])
print(ser6)
Output:
0 21.0
1 22.0
2 NaN
3 100.0
4 150.0
dtype: float64
Checking for Missing values:
ser6 = pd.Series([21, 22, None, 100, 150])
print(pd.isnull(ser6)
Output:
0 False
1 False
2 True
3 False
4 False
dtype: bool
Dropping Missing Values:
ser6 = pd.Series([21, 22, None, 100, 150])
res = ser6.dropna()
print(res)
Output:
0 21.0
1 22.0
3 100.0
4 150.0
dtype: float64