What’s in a DataFrame?

Part 1 of a gentle, thorough, and accessible introduction to the Pandas module in Python

Murtaza Ali
Age of Awareness

--

Photo by Mika Baumeister on Unsplash

This is the first article in a new series I hope to write. The goal of the articles I will put out is to present a step-by-step and detailed introduction to Python’s data science module — the ever-increasing-in-popularity Pandas. My tutorials are meant to be accessible to beginners and assume no prior knowledge of Pandas. I do assume a working knowledge of Python, but I will provide links when appropriate as additional resources for potentially confusing topics I might discuss.

Let’s get right into it.

The building block of Pandas

Before we get into Pandas specifically, let’s consider data at a high level. What’s the most common way data is stored, regardless of what tool or technology one uses? Usually a straightforward table with a set of rows and associated columns.

With that basic building block in mind, we can move into Pandas specifically. The primary data structure used in Python is called a DataFrame — effectively just a fancy name for a table. There are two important conventions to note, because they come up often in various DataFrame operations:

  1. The row labels of a DataFrame are collectively known as the…

--

--

Murtaza Ali
Age of Awareness

PhD student at the University of Washington. Interested in human-computer interaction, data visualization, and computer science education.