Pandas: Indexing and Slicing

Ethan Guyant
Inquisitive Nature
3 min readJun 17, 2022

--

Pandas offers a wide variety of ways to select, index, or slice Series and DataFrame objects. This can be helpful by providing flexibility, but also confusing due to the wide range of possible options. This article will focus on selecting, indexing, and slicing DataFrame objects using the []operator and the .iloc and .loc attributes.

For more details and other options review the (pandas documentation).

The Basics

The basic form of selecting data from a DataFrame is using the [] operator. The []operator is a hybrid of using labels and location to select certain columns and rows of a DataFrame. Using .iloc and .loc might be favorable as they provide more explicit indexing and slicing, see below sections for details and examples.

Indexing Columns

With a DataFrame the []operator is used for selecting columns, or more appropriately selecting from the column index. Meaning when using DataFrame[], column names should be passed in. For more information on the DataFrame object attributes see Pandas: Introduction to the Library.

Slicing Rows

Slice & Slicing

A slice is an object typically made up of a portion of a sequence and is created using [], with colons separating the start, stop, and step numbers (e.g. [1:10:2]). Slicing is the selection of a range of items contained within a sequence object.

As mentioned above a slice has three components: start, stop, and step. When specifying the start and stop values it is important to note that the start value is inclusive, while the stop value is exclusive.

DataFrame Slicing

With a DataFrame object slicing with [] operator will slice the rows of the DataFrame. This can lead to some confusion with indexing columns using []. The [] operator when passed a single argument, it will select columns (label selection), when passed a slice it will slice on the rows (location selection). Similar to indexing, slicing with .loc and .iloc might be preferred, see below for additional details.

A common method for selecting certain rows is to apply a logical condition to filter against, or Boolean Indexing. The logical condition returns True or False values for each row of the DataFrame. Enclosing the logical condition in square brackets [ ] subsets the rows of the DataFrame, returning the rows where the logical condition evaluates to True.

Indexing and Slicing DataFrames

A preferable method for indexing and slicing a DataFrame is using the .loc and .iloc properties of the DataFrame object. Obtaining a subset of a DataFrame using these properties provides a more explicit method for selection. If the desired selection is by label .loc will be used or if by integer-location .iloc will be used.

Summary

Pandas offers a multitude of options for selecting, indexing, and slicing DataFrame options. This articles provided an introduction to the .loc method for label based selecting, the .iloc method for integer-location based selecting as well as using the indexing operator [] whose behavior can depend on the index and arguments provided. Due to this confusion could be possible and the use of .loc and .iloc might be preferable.

If you enjoyed this article and found it helpful don’t forget to give it a clap, follow and subscribe to the INQUISITIVE NATURE publication!

Originally published at https://ethanguyant.com on June 17, 2022.

--

--