What is Feature Engineering?

Hema Kalyan Murapaka
4 min readFeb 18, 2023

--

Is feature engineering necessary?’

To get a perfect model in machine learning, we need clean and perfect data to train the model. Therefore, we need data with essential features to predict the output accurately. Here, Feature Engineering comes into the picture. This blog will give you a brief explanation of “Feature Engineering”.

Feature Engineering is the preprocessing technique in machine learning, which is used to convert raw data into a set of features that can be used to create a machine-learning model. It enhances the performance and accuracy of the machine learning models.

Let’s know what is a feature.
Generally, Every machine learning algorithm needs data as input to predict the output. A feature in machine learning defines measurable data attributes that can be used as input to the machine learning algorithm. The input in the machine learning algorithms can be in the form of Text, Images, Videos, and Audio.

Feature Engineering consists of various Processes such as

  1. Feature Creation: Feature Creation involves identifying a set of useful variables in the data to predict the output. The new features are created by mixing existing features in the data using adding or removing some features and these new features have great flexibility.
  2. Feature Transformation: Feature Transformation involves applying mathematical functions to transform features from one format into another format. This step further increases the model’s performance and accuracy.
  3. Feature Extraction: Feature Extraction involves deriving useful information from existing features using some techniques like Principal Component Analysis(PCA) Linear Discriminant Analysis(LDA), and Cluster Analysis. This step reduces the volume of the data so it can be used for data modelling.
  4. Feature Selection: Feature Selection involves selecting required features to predict the output. In the process of developing a machine learning model, only a few features are required to predict b the output. Therefore we need to select useful features for prediction. There are various methods in feature selection such as Correlation Analysis, Regularization and many more.

Feature Engineering Techniques:

  1. Imputation: Imputation deals with missing values. Missing values can severely impact the performance and accuracy of the machine-learning model. There are typically two types of Imputations
    Ⅰ. Numerical Imputation: It is the process of replacing missing numerical values with estimated values. There are several methods like Mean imputation, median imputation and mode imputation.
    Ⅱ. Categorical Imputation: It is the process of replacing missing categorical values with estimated values. There are several methods of categorical imputation, including mode imputation, hot deck imputation, and cold deck imputation
  2. Handling Outliers: Outliers are the data points that are unusually distant from other observations in the dataset. This impacts the model performance. There are several methods to handle outliers such as Inter Quartile Range(IQR), Z-Score, Mahalanobis distance and many more.
    Let’s consider a small dataset, sample = [17, 101, 11, 3, 19, 16, 11, 31, 2, 15, 190, 1]. By looking at it, one can say ‘101’, and ‘190’ are outliers that are much larger than the other values.
Computation with and without Outliers

From the above calculations, we can observe clearly that the Mean is more affected than the Median.

3. Log Transform: Log Transform is the mathematical operation that is used to transform one scale to another. It is used to turn skewed data into normal data. It reduces the impact of outliers on the data.

4. One-Hot Encoding: It is one of the most popular encoding technique used to convert categorical data into numerical data without losing any information.

Example of One hot encoding

5. Binning: Binning is primarily used to strengthen the model and avoid overfitting, although it has a performance cost. It is the trade-off between performance and overfitting.

Conclusion:

The creation of fresh data features from unprocessed data is known as feature engineering. By using this method, engineers can extract a new or more valuable collection of features by analyzing the potential information and raw data. It is possible to think of feature engineering as a generalization of mathematical optimization. I hope you gained some knowledge about feature engineering and approaches. You can leave a comment if the article raises any questions for you.

Follow me on:

Email: kalyanmurapaka274@gmail.com

LinkedIn: https://www.linkedin.com/in/hema-kalyan-murapaka-3048b422b

Instagram: https://www.instagram.com/im_kalyan_274

Twitter: https://twitter.com/HemaKalyan26

--

--