What is Feature Engineering?

Dr. Roi Yehoshua
AI Made Simple
Published in
3 min readMar 6, 2023

--

Feature engineering is a process in which we create new features from the existing features in our data set. The new features are often more relevant to the prediction task than the original set of features, and thus can help the machine learning model achieve better results.

Sometimes the new features are created by applying simple arithmetic operations, such as calculating ratios or sums from the original features. In other cases, more specific domain-knowledge on the data set is required in order to come up with good indicative features.

Feature Engineering Example

To demonstrate feature engineering, we will use the California housing dataset available at Scikit-Learn. The objective in this data set is to predict the median house value of a given district in California, given different features of that district, such as the median income or the average number of rooms per household.

First, we fetch the data set:

from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()
X, y = data.data, data.target
feature_names = data.feature_names

In order to explore the data set, let’s merge the features and the labels into one DataFrame:

mat = np.column_stack((X, y))
df = pd.DataFrame(mat, columns=np.append(feature_names, 'MedianValue'))
df.head()

--

--

AI Made Simple
AI Made Simple

Published in AI Made Simple

AI Made Simple breaks down complex concepts into digestible insights, making the world of AI accessible to everyone.

Dr. Roi Yehoshua
Dr. Roi Yehoshua

Written by Dr. Roi Yehoshua

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/