Selecting Subsets of Data in Pandas: Part 2

Ted Petrou
Dunder Data
Published in
20 min readDec 8, 2017

--

Part Two: Boolean Indexing

This is part two of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following four topics.

  1. Selection with [], .loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Learn More

Master Data Analysis with Python is an extremely comprehensive text with over 80 chapters, 500 exercises, and video lessons to help you become an expert.

Part 1 vs Part 2 subset selection

Part 1 of this series covered subset selection with [], .loc and .iloc. All three of these indexers use either the row/column labels or their integer location to make selections. The actual data of the Series/DataFrame is not used at all during the selection.

In Part 2 of this series, on boolean indexing, we will select subsets of data based on the actual values of the data in the Series/DataFrame and NOT on their row/column labels or integer locations.

Documentation on boolean selection

I always recommend reading the official documentation in addition to this tutorial when learning about boolean…

--

--

Ted Petrou
Dunder Data

Author of Master Data Analysis with Python and Founder of Dunder Data