Two reasons your column renaming does not work in Pandas

Danferno
Data Oriented Programming Tips
2 min readAug 21, 2023

Maybe this only applies to those coming to Python from statistical software, but I’ve often tried to rename columns in Pandas and it doesn’t work, and it doesn’t lead to any error message either. In my case, the problem was usually that Pandas has defaults that are counter-intuitive to data scientists.

import pandas as pd
df = pd.DataFrame(data=[0,1,2], columns=['A'])

# Doesn't work
df = df.rename({'A': 'B'})
print(df.columns)

# Index(['A'], dtype='object') ???

The rename keyword is used to rename columns in both R and Stata, so you’d think this would be a valid way to rename columns in Pandas too. And it is. It’s just that for some reason, they decided that the default axis to rename on is the axis, not the columns (see help).

That means you need to explicitly specify that you want to rename the columns.

import pandas as pd
df = pd.DataFrame(data=[0,1,2], columns=['A'])

# Works
df = df.rename(columns={'A':'B'})
print(df.columns)

# Index(['B'], dtype='object') Yay

You could also use the axis='columns' option, but I find the first option easier to read/skim afterwards because it’s closer to natural language (‘rename columns’) than the alternative.

Finally, a beginner pitfall that stumped me a few times while learning Pandas is the whole in-place versus not in-place option. Say what? Answers on Stackoverflow or elsewhere will often neglect to show the full code, telling you only to use df.rename(columns={'A':'B'} .

That’s not wrong, but if you are used to Statistical software, you might not realize that this does not change df unless you actually assign the output to df or use the inplace=True option.

# Doesn't work
df = pd.DataFrame(data=[0,1,2], columns=['A'])
df.rename(columns={'A':'B'})
print(df.columns)

# Index(['A'], dtype='object') ???


# Works (but not recommended)
df.rename(columns={'A':'B'}, inplace=True)
print(df.columns)

# Index(['B'], dtype='object') Yay

I do not recommend ever using the inplace option. It will get you into trouble if you ever swap to a parallel version of Pandas (e.g. Dask).

--

--