What’s in a DataFrame?
Part 1 of a gentle, thorough, and accessible introduction to the Pandas module in Python
This is the first article in a new series I hope to write. The goal of the articles I will put out is to present a step-by-step and detailed introduction to Python’s data science module — the ever-increasing-in-popularity Pandas. My tutorials are meant to be accessible to beginners and assume no prior knowledge of Pandas. I do assume a working knowledge of Python, but I will provide links when appropriate as additional resources for potentially confusing topics I might discuss.
Let’s get right into it.
The building block of Pandas
Before we get into Pandas specifically, let’s consider data at a high level. What’s the most common way data is stored, regardless of what tool or technology one uses? Usually a straightforward table with a set of rows and associated columns.
With that basic building block in mind, we can move into Pandas specifically. The primary data structure used in Python is called a DataFrame — effectively just a fancy name for a table. There are two important conventions to note, because they come up often in various DataFrame operations:
- The row labels of a DataFrame are collectively known as the…