The Name of the Game
How to add, update and remove index names on pandas DataFrames
There are several subtleties to DataFrame
indexes in pandas
. They contain multiple levels, different data types and can be transformed in dozens of ways. There is one attribute of these indexes that is often ignored — their name!
Indexes have labels for each row or column that indicate the meaning of the associated data values. A DataFrame
of users would likely have rows indexed by user_id
and index labels of 1
, 2
, 3
and so forth. The column index would have labels like last_login
and user_type
.
Index names specify within the DataFrame
the type of data the row or column labels represent. For example, the same row index on the users DataFrame
can also represent the row number, a foreign key or almost anything else. Without an index name specifying what the labels mean it’s impossible to say. This can lead to errors down the line if the users DataFrame
is joined to another DataFrame
on the mistaken assumption that both sets of row labels have the same meaning.
Set Up
We’re going to work through how to add, update and remove names from indexes. We’ll be looking at a variety of DataFrame
s in this story. To begin, we’ll set up a function to display the contents as well as the names of the indexes. The remaining snippets of code are continuations and require previous snippets to execute correctly. The complete code for this story is available on Github.
Unnamed
Here is some code to create a simple pandas.DataFrame
. The indexes don’t have names since those are not set.
a b
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4Row Index Name: None
Column Index Name: None
Named At Creation
One way to set the index names is during DataFrame
creation. Notice the name
parameters for the row and column indexes.
col_name a b
row_name
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4Row Index Name: row_name
Column Index Name: col_name
This changed the structure of the DataFrame
. Both the row and column indexes have a name. The name of the column index lines up with the column names and the name of the row index sits on top of the row labels.
Named by Pivot
Some operations like pivot
can change the names of indexes. Here’s code to create a DataFrame
that we can pivot.
customer product amount
0 10 phone 5.0
1 11 phone 10.0
2 10 tv 7.0
3 11 laptop 12.0
4 12 tv 3.0
5 12 phone 9.0Row Index Name: None
Column Index Name: None
Notice that so far the row and column indexes don’t have names. Now we’ll call pivot
on the DataFrame
and see what happens.
product laptop phone tv
customer
10 0.0 5.0 7.0
11 12.0 10.0 0.0
12 0.0 9.0 3.0Row Index Name: customer
Column Index Name: product
Now the column labels product
and customer
have been moved to the column index name and row index name after the pivot
operation. This can be handy, but we’d like the option to update these names in case they no longer make sense.
Named by Update
The rename_axis
function allows us to update the name of either the row or column index. The index
parameter updates the name of the row index and the columns
parameter updates the name of the column index.
device laptop phone tv
account
10 0.0 5.0 7.0
11 12.0 10.0 0.0
12 0.0 9.0 3.0Row Index Name: account
Column Index Name: device
The index names are updated to ones that made more sense in the context of the pivoted DataFrame
.
Name Removed
The final operation we can perform on index names is to remove them. This also uses the rename_axis
function, but sets the new names to None
.
laptop phone tv
10 0.0 5.0 7.0
11 12.0 10.0 0.0
12 0.0 9.0 3.0Row Index Name: None
Column Index Name: None
Conclusion
Once you know how to work with index names they are easy to apply or remove. The meaning of the data in DataFrame
s with index names is more clear. This leads to better data quality and more accurate insights from your data.