Minimally Sufficient Pandas

Ted Petrou
Dunder Data
Published in
28 min readJan 30, 2019

--

In this article, I will offer an opinionated perspective on how to best use the Pandas library for data analysis. My objective is to argue that only a small subset of the library is sufficient to complete nearly all of the data analysis tasks that one will encounter. This minimally sufficient subset of the library will benefit both beginners and professionals using Pandas. Not everyone will agree with the suggestions I lay forward, but they are how I teach and how I use the library myself. If you disagree or have any of your own suggestions, please leave them in the comments below.

By the end of this article you will:

  • Know why limiting Pandas to a small subset will keep your focus on the actual data analysis and not on the syntax
  • Have specific guidelines for taking a single approach to completing a variety of common data analysis tasks with Pandas

Learn More

Master Data Analysis with Python is an extremely comprehensive text with over 80 chapters, 500 exercises, and video lessons to help you become an expert.

Pandas is Powerful but Difficult to use

Pandas is the most popular Python library for doing data analysis. While it does offer quite a lot of functionality, it is also regarded as a fairly difficult library to learn well. Some reasons for this include:

  • There are often multiple ways to complete common tasks

--

--

Ted Petrou
Dunder Data

Author of Master Data Analysis with Python and Founder of Dunder Data