Three Ways to Format Column Headers

Brent Smart
3 min readJun 28, 2022

--

Photo by Mika Baumeister on Unsplash

In this blog we investigate how to adjust a data frame’s headers. This is a crucial initial step, as it is important that headers are consistent when accessing data from those columns.

The data I used is stored in a Google Sheet. I downloaded that workbook as an excel file (.xlsx). I did so instead of exporting each individual sheet as a .csv file. Downloading and reading in the entire workbook would give easier access to all the workbook’s sheets. After reading in the workbook, I previewed the sheet names. Once I confirmed their accuracy, I then created a data frame by concatenating (combining) sheets 3–12 using splicing. I verified the the amount of columns and rows using .shape and then previewed the data frame using the .info() method.

Using the .info() method highlights a couple of noticings:

* The headers are inconsistently formatted and need to be cleaned.
* The data in some numerical columns were not read correct and need their values converted.
* Since each columns’ totals for each non-null values are inconsistent, further investigation into missing data is needed.

Let’s hone in on the ill-formatted headers — you may have noticed column 7 is aligned with columns 0–3. This indicates that the other columns have extra leading spaces.

We can manually create a column with the correct header, assign it the values of the incorrect column, and remove then remove the entire column with the ill-formatted title.

Nice! 1 column down, many to go! You probably noticed that the column created was tacked on the end of the data frame. To avoid this, another way to create columns with correct headings is using the .insert( ) method must have the desired position, name of new column, and the values for that column.

All this sounds pretty good, but as a former elementary teacher, I always look for the most efficient ways to complete tasks. To get column names, attach .columns to the data frame. We can remove these extra spaces all at once using the built-in python method .strip(). For the underscores, the .replace() method will address the inconsistent underscores. Do not forget to make this change applicable moving forward by making the data frame’s columns = to the necessary changes.

There you have it — consistent columns!

Don’t forget to clap 👏 below if you found this blog to be helpful. You can reach out with questions by emailing chatconsmartbre@gmail.com .

--

--