2 Easy Steps to Extract Categorical Columns in Python

Shashanka Shekhar
Stackademic
Published in
3 min readMar 1, 2024

--

Python is a high-level, general-purpose and interpreted programming language. It is known for its ease of use, powerful standard library and dynamic semantics. Python is widely used in various sectors including machine learning, artificial intelligence, data analysis, web development and many more. Its simple, easy-to-learn syntax emphasizes readability and therefore reduces the cost of program maintenance.

Photo by Sergi Kabrera on Unsplash

The problem we will be solving?

A snapshot of our data

This is a Big Mart Sales data that contains sales of each product at a particular outlet in a BigMart. A number of attributes of different products have been defined which affect the value of sales they generate. The shape of data being (8523, 12).

There are total 8523 rows and 12 columns

BMS is our DataFrame storing the data having 12 different columns shown by BMS.info()

Here we want to extract all the categorical columns in our DataFrame. Now categorical columns are all those columns which have Dtype of object.

1.Using list comprehension to extract categorical columns:

cat_cols = [cols for cols in BMS.columns if BMS[cols].dtype == 'object']

We will be using the above list comprehension approach to extract categorical columns. We could have used loops to do the same but list comprehensions are concise, easy to use and get the job done in a single, readable line of code.

Let’s focus on different portions of the code one by one:

  1. for cols in BMS.columns — here BMS.columns gives us all the column names from the BMS DataFrame. Now for loop needs an iterator which here is cols so cols will we be assigned with every column name of the DataFrame BMS one by one.
  2. if BMS[cols].dtype == [‘object’] — Now for every column name cols is assigned BMS[cols].dtype will extract the datatype of the column name and if it is equal to [‘object’] then the condition becomes True and the cols containing the corresponding column name is returned and added to the list cat_cols.

2.Checking the contents of the cat_cols list:

We will use print(cat_cols) function to display our output and as you can see in the image above the columns in cat_cols are ‘Item_Fat_Content’, ‘Item_Type’, ‘Outlet_Size’, ‘Outlet_Location_Type’, and ‘Outlet_Type’. These all are object datatype categorical columns which can be confirmed with the results of BMS.info()

Similarly we can extract numeric columns from the DataFrame too, I have already written an article on it. Check it out with the link immediately below.

To extract numeric column from a DataFrame in python refer to this link.

To learn how to use OneHotEncoder in python refer to this link.

To learn how to use PCA in python refer to this link.

To learn how to use SMOTE in python refer to this link.

To do aggregation of categorical columns in python refer to this link.

To read more stories like this you can follow me with this link.

References:

  1. https://www.geeksforgeeks.org/what-is-python/
  2. https://www.python.org/doc/essays/blurb/
  3. https://www.britannica.com/technology/Python-computer-language

Stackademic 🎓

Thank you for reading until the end. Before you go:

--

--

Contributor for Microsoft Power BI. I like Data Analysis and Data Science. Also I enjoy sports, videogames and Japanese Anime in my free time.