Pandas Tips & Tricks for Beginners Part-2: Rename columns

Jay Savani
5 min readJan 28, 2023

--

Photo by James Harrison on Unsplash

Welcome to my second blog post ongoing Pandas Tips & Tricks series for beginners. Today’s topic is about renaming columns in pandas. As a data scientist, one of the most crucial steps in the data preprocessing process is ensuring that your data is well-structured and easy to work with. One way to accomplish this is by renaming the columns of your dataframe to more meaningful and descriptive names. Pandas provides several ways to achieve this task, such as using the rename() function, and .columns attribute. In this blog post, we will explore these methods in detail and provide examples of how to use them effectively as well as some common pitfalls to avoid. Whether you're a beginner or an experienced data scientist, this post will give you the information you need to effectively rename your columns and improve the structure of your data. So, stay tuned to learn how to rename the columns of your dataframe quickly, easily, and efficiently.

Table of Contents:

  1. Introduction
  2. Rename columns : Using Dictionary
  3. Rename columns : Using replace() function and Special Case
  4. Rename columns : Using add_prefix() and add_suffix() functions
  5. Common pitfalls

Introduction

There are various ways to perform this and we are gonna apply some of the methods which are commonly used. Let’s just dive into it. To perform this task, we are assuming following data frame.

1. Using Dictionary

Just write your old column names and corresponding new column names inside a dictionary as key-value pair. Then pass this into a .rename() function of pandas dataframe. Parameter axis = 1 indicates that you want to make this changes on the columns.

df = df.rename({
'col one': 'col_one',
'col two': 'col-two',
'col three': 'col=three',
'col four': 'col@four'
}, axis = 1) #you can also use: axis="column"

2. Using replace() function

In simple explanation, its more like if you have the column names in which you want add of remove specific characters or substring with another characters or substring simultaneously. It’s three step process.

  • First, select the columns of your choice or all the columns (using dataframe.columns).
  • Next, add .str to the selection (since the replace() function is part of the string module).
  • Finally, add the replace("A", "B") function, where "A" is the character or string that you want to replace and "B" is the replacement character or string for "A".
df.columns = df.columns.str.replace(" ", "_")

Special Case: In pandas, you can remove any special characters that are present in column names by using the str.replace() method. Here is an example of how you can remove all special characters from column names in a dataframe df:

df.columns = df.columns.str.replace('[^a-zA-Z0-9_]', '')

The regular expression pattern '[^a-zA-Z0-9_]' that I used in the code will match any character that is not a letter, a digit, or an underscore. This will include all special characters, including /, \, ., @, !, and ?, as well as any other special characters that may be present in the column names.

3. Using add_suffix() and add_prefix() method

Pandas provides two methods, add_prefix() and add_suffix() , which can be used to easily add a prefix or suffix to the column names of a dataframe.

The add_prefix() method allows you to add a prefix to the column names of a dataframe. Let’s say you have a dataframe and you want to add the prefix 'new_' to each column name. You can do this by using the following code:

df = df.add_prefix('new_')

The add_suffix() method works similarly, but adds a suffix to the column names of a dataframe. Let’s say you have a dataframe and you want to add the suffix '_new' to each column name. You can do this by using the following code:

df = df.add_suffix('_new')

Common pitfalls

There are a few common pitfalls to be aware of when renaming columns in pandas:

Not using the inplace parameter:

  • By default, the rename() function returns a new dataframe with the renamed columns, rather than modifying the original dataframe. If you want to modify the original dataframe, you need to set the inplace parameter to True.

Using the wrong axis

  • The rename() function has an axis parameter that specifies whether you want to rename the columns (axis=1) or the rows (axis=0). Make sure to set the correct axis to avoid renaming the wrong elements of your dataframe.

Overwriting existing column names

  • When renaming columns, make sure that the new column names do not already exist in the dataframe. This can cause confusion and lead to errors when working with your data.

Not validating the new column names

Some characters like / , ‘,’ , ‘.’ and spaces are not allowed in column names, make sure that the new column names are valid and do not contain any illegal characters.

Not handling missing values:

  • If some of the columns you want to rename do not exist in the dataframe, the rename() function will raise a KeyError. To avoid this, you can use the rename() method with columns attribute and specify a dictionary of columns to rename and only the columns that exist will be renamed.

By keeping these pitfalls in mind and taking the necessary precautions, you can easily and effectively rename the columns of your dataframe in pandas.

Author’s Note

Thank you for reading my blog post on renaming columns in pandas. I hope that you found the information and examples provided to be helpful and informative. If you’re looking to learn more about working with data in pandas, I am going to create more tips and tricks for intermediate and advanced level as well in several upcoming blog posts that you may find interesting.

Also check out part-1 where I have explained different ways of creating a dataframe. You can Click Here.

As always, thanks for reading! I would love to read your responses :)

You can also connect me via Linkedin.

--

--