How to do Categorical feature encoding in Machine Learning

Ravi Srivastava
Artifical Mind
Published in
4 min readJul 14, 2020

--

As we know that the machine learning model always perform well with numeric data but in real life data comes in form of both numeric and discrete/categorical value. In this case the first job of data scientist is to convert these discrete/categorical value to numeric value and this process also known as feature engineering. Feature engineering basically have two task.

1- Derived feature from existing feature

2- Categorical feature encoding

We are going to discuss about the categorical feature encoding with the help of Titanic data. This is one of best data set to learn the ML. You can download the Titanic data set from Kaggle. When you will start working with Titanic data set you will find the some feature are not numeric so we have to first convert these feature to numeric before feed it to ML algorithm.

Read the Titanic data set:

train_df = pd.read_csv(os.path.join(dataset_path, ‘train.csv’))

Categorical feature encoding provides us some technique to convert the discrete value to numeric value. Following are the feature encoding techniques.

1- Binary Encoding

2- Label Encoding

3- One-Hot Encoding

Binary Encoding:

Binary encoding means to convert the feature labels 0 or 1. You can use binary encoding when you have only two label in your feature…

--

--

Ravi Srivastava
Artifical Mind

I have 9 years experience in software development and currently working with HCL Technologies. https://www.linkedin.com/in/ravi-srivastava-54215a92/