Pandas Series — A Beginner’s Guide!

Data Science Delight
4 min readJul 26, 2023

--

Photo by Sid Balachandran on Unsplash

Hello! Welcome to Pandas Series — A Beginner’s Guide. In this session, we will be covering the following topics along with examples for better understanding:

Topics to be discussed:

  • Introduction to pandas: Definition of Pandas library, data structures. Import convention.
  • Pandas Series: Introduction, creation using different ways, operations, indexing & slicing.
  • Exercises to Practice

Before diving into the topic! I would like to request the readers to visit my previous articles on NumPy as it covers the fundamental concepts that will serve as a solid foundation for the content discussed in this tutorial. You can access those articles here:

Additionally, please “Follow” me on Medium as it would encourage me to write more useful content on Data Science and Machine Learning.

Let’s dive into the topic!

Photo by Maxime Horlaville on Unsplash

Pandas is a popular open-source library built on top of NumPy. It is mainly used for fast analysis and data cleaning and data manipulation. It also has some built-in visualization features.

To learn more about Pandas, you can visit here.

The two main data structures provided by Pandas are Series & DataFrame:

Series is a one-dimensional labeled array, which kinda looks similar to the powerful version of Python list.

DataFrame is two-dimensional labeled data with rows & columns. It is just like a spreadsheet or SQL table.

In this explanation (Part 1), let’s focus on the Series

Series:

A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.). It is similar to a Python list or NumPy array, but with some additional features that is, it has indices for each element, allowing for more convenient & powerful data manipulation.

Creating a Series:

To create a series in Pandas, you can use ‘pd.Series()’ function.

  1. Creating a Series with “default index”:
import pandas as pd

# Create a Series with Default Index:
ser1 = pd.Series([1, 2, 3, 4, 5, 6])
print(ser1)

Output:

0    1
1 2
2 3
3 4
4 5
5 6
dtype: int64

Here we created a Series containing integers from 1 to 6. By default, the indices are auto-generated as integers starting from 0 & increasing by one for each element.

2. Creating a Series with a “custom index”:

# Create a Series with Custom Index:
ser2 = pd.Series([10, 20, 30, 40, 50], index = [1, 2, 3, 4, 5])
print(ser2)

Output:

1    10
2 20
3 30
4 40
5 50
dtype: int64

The second Series ‘ser2’ contains integers from 10 to 50 with custom index (1 to 5).

3. Creating a Series from a “dictionary”:

ser3 = pd.Series({'a': 10, 'b': 15, 'c': 20, 'd': 25, 'e': 30})
print(ser3)

Output:

a    10
b 15
c 20
d 25
e 30
dtype: int64

The third Series ‘ser3’ is created from a dictionary with ‘keys’ as ‘index’ & ‘values’ as ‘data’.

4. Creating a Series with an array:

import numpy as np
import pandas as pd

arr = np.array([11, 12, 15, 21, 25, 68])

ser4 = pd.Series(arr)
print(ser4)

Output:

0    11
1 12
2 15
3 21
4 25
5 68
dtype: int64

Accessing elements of a Series:

We can access elements using custom indices or default integer indices.

# Accessing element by Default index:
print(ser1[0]) # Output: 1

# Accessing element by Custom Index:
print(ser2[2]) # Output: 20

The real power of Pandas Series lies in its ability to perform various mathematical operations. Let’s have a look at those:

Operations on Series:

ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])

res = ser5 + 5

print(res)

Output:

A    15
B 25
C 35
D 45
E 55
dtype: int64

Example — 2:

ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])

res = np.sqrt(ser5)
print(res)

Output:

A    3.162278
B 4.472136
C 5.477226
D 6.324555
E 7.071068
dtype: float64

Filtering in Series:

You can also filter elements based on Series.

ser5 = pd.Series([10, 20, 30, 40, 50], index = ['A', 'B', 'C', 'D', 'E'])

res = ser5[ser5 > 32]
print(res)

Output:

D    40
E 50
dtype: int64

Handling Missing Data:

Pandas provide excellent features to handle missing data. The missing data is represented as ‘NaN’ (Not a Number)

ser6 = pd.Series([21, 22, None, 100, 150])

print(ser6)

Output:

0     21.0
1 22.0
2 NaN
3 100.0
4 150.0
dtype: float64

Checking for Missing values:

ser6 = pd.Series([21, 22, None, 100, 150])
print(pd.isnull(ser6)

Output:

0    False
1 False
2 True
3 False
4 False
dtype: bool

Dropping Missing Values:

ser6 = pd.Series([21, 22, None, 100, 150])

res = ser6.dropna()

print(res)

Output:

0     21.0
1 22.0
3 100.0
4 150.0
dtype: float64

--

--

Data Science Delight

Content Creator | Sharing insights & tips on data science | Instagram: @datasciencedelight | YouTube: https://www.youtube.com/channel/UCpz2054mp5xfcBKUIctnhlw