Slicing and Dicing Pandas DataFrames

Punyakeerthi BL
3 min readJun 27, 2024

Before proceeding with this article, please read the following for continuation:

Replace values in Pandas DataFrame

When working with data in pandas DataFrames, you often need to select specific rows, columns, or subsets for further analysis or manipulation. This is where loc and iloc come in as powerful tools for data selection.

Why Use loc and iloc?

Imagine you have a large dataset of customer information in a DataFrame. You might want to:

  • Filter rows based on specific criteria (e.g., customers from a particular region)
  • Select columns containing relevant data (e.g., purchase history)
  • Grab specific data points by their row and column labels

loc and iloc make these tasks efficient and intuitive, allowing you to target data using labels or positions within the DataFrame.

Understanding loc

  • Purpose: Selects rows and/or columns by label.
  • Syntax: df.loc[row_labels, column_labels]
  • Parameters:
  • row_labels: Can be a single label, a list of labels, a slice, or a boolean array for filtering.
  • Single label: Selects the row with that specific label.
  • List of labels: Selects rows corresponding to the labels in the list.
  • Slice: Selects rows within a specified range based on labels (similar to Python slicing).
  • Boolean array: Selects rows where the corresponding element in the array is True.
  • column_labels (optional): Similar to row_labels, but for selecting columns. If not provided, selects all columns for the chosen rows.

Example:

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 38],
'City': ['New York', 'Los Angeles', 'Chicago', 'Miami']}
df = pd.DataFrame(data)
# Select row with label 'Bob' (using single label)
print(df.loc['Bob'])
# Select rows with labels 'Alice' and 'Charlie' (using list of labels)
print(df.loc[['Alice', 'Charlie']])
# Select rows where Age is greater than 25 (using boolean array)
print(df.loc[df['Age'] > 25])
# Select 'Name' and 'City' columns (using column labels)
print(df.loc[:, ['Name', 'City']])

Output:

Name  Age       City
Bob Bob 30 Los Angeles

Name Age City
Alice Alice 25 New York
Charlie Charlie 22 Chicago

Name Age City
Bob Bob 30 Los Angeles
David David 38 Miami
Name City
0 Alice New York
1 Bob Los Angeles
2 Charlie Chicago
3 David Miami

Understanding iloc

  • Purpose: Selects rows and/or columns by integer position.
  • Syntax: df.iloc[row_positions, column_positions]
  • Parameters:
  • row_positions: Can be an integer, a list of integers, or a slice for positional selection.
  • Integer: Selects the row at that specific position (0-based indexing, starting from the first row).
  • List of integers: Selects rows corresponding to the positions in the list.
  • Slice: Selects rows within a specified range based on positions (similar to Python slicing).
  • column_positions (optional): Similar to row_positions, but for selecting columns by position. If not provided, selects all columns for the chosen rows.

Example:

Python

# Select second row (using integer position)
print(df.iloc[1])
# Select first two rows (using list of positions)
print(df.iloc[[0, 1]])
# Select rows from index 1 (inclusive) to 3 (exclusive)
print(df.iloc[1:3])
# Select first column (using integer position for column)
print(df.iloc[:, 0])

Output:

Name    Bob  Age    30       City  Los Angeles
dtype: object


Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
Name Age City
Bob Bob 3

If you like this post please follow me on Linked In: Punyakeerthi BL

--

--

Punyakeerthi BL

Passionate Learner in #GenerativeAI|Python| Micro-Service |Springboot | #GenerativeAILearning Talks about #GenerativeAI,#promptengineer, #Microservices