Member-only story
Feature Engineering v/s Feature Selection
Feature Engineering and Feature Selection for Beginners and Non-Python and Machine Learning Tutorial
This is the continuation of the first part of the Feature Engineering v/s Feature Selection article. In the first part, I explained the Feature engineering, Feature selection, techniques to handle missing data, and continuous data.
In this part, I’ll talk about the techniques to handle categorical data and feature selection techniques.
How to manage categorical data?
Categorical characteristics represent types of data that can be divided into groups. For example, genders and levels of education.
All non-numeric values must be converted to integers or floats to be used in most machine learning libraries. Common methods for handling categorical characteristics are:
Label Encoding
Label encoding simply converts each categorical value in a column into a number.
It is recommended to use label encoding to convert them to binary variables.
In the following example you will learn how to use Scikit -learn ‘s LableEncoder to transform categorical values into binary: