CodeX
Published in

CodeX

Feature Selection Algorithms for Machine Learning

Choosing the right ones

Feature Selection

Correlation between features

Data set with redundant features
Reduced data set with only important features
Click on image to show your support

Algorithms

Boruta Feature Selection

# install the package
!pip install boruta
# import important libraries
import pandas as pd
from boruta import BorutaPy
from sklearn.ensemble import RandomForestRegressor
import numpy as np
#load data
heart_data = pd.read_csv("healthcare-dataset-stroke-data.csv")
# converting to numericheart_data["gender"] = pd.factorize(heart_data["gender"])[0]
heart_data["ever_married"] = pd.factorize(heart_data["ever_married"])[0]
heart_data["work_type"] = pd.factorize(heart_data["work_type"])[0]
heart_data["Residence_type"] = pd.factorize(heart_data["Residence_type"])[0]
heart_data["smoking_status"] = pd.factorize(heart_data["smoking_status"])[0]
# additional cleaning
heart_data.dropna(inplace =True)
heart_data.drop("id", axis =1, inplace = True)
heart_data.head()
Dataset after cleaning
Heart Stroke dataset
X = heart_data.drop("stroke", axis = 1)
y = heart_data["stroke"]
# we will use the randomforest algorithm
forest = RandomForestRegressor(n_jobs = -1,max_depth = 10)
# initialize boruta
boruta = BorutaPy(estimator = forest, n_estimators = 'auto',max_iter = 50,)
# Boruta accepts np.array
boruta.fit(np.array(X), np.array(y))
# get results
green_area = X.columns[boruta.support_].to_list()
blue_area = X.columns[boruta.support_weak_].to_list()
print('Selected Features:', green_area)
print('Blue area features:', blue_area)
Result of the Boruta Algorithm
Result of the Boruta algortihm

mRMR Feature Selection

!pip install mrmr_selection
from mrmr import mrmr_classif
selected_features = mrmr_classif(X=X, y=y, K=2)
print(selected_features)
Features returned by MRMR with K=2
Features returned by MRMR with K=2
# top 4 features
top_4 = mrmr_classif(X=X, y=y, K=4)
# top 6 features
top_6 = mrmr_classif(X=X, y=y, K=6)
print("Best 4 features:", top_4)
print("Best 6 features:", top_6)
Features returned by MRMR for k = 4 and k = 6
Features returned by MRMR for k = 4 and k = 6

Conclusion

Click on the image to show your support

--

--

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Moosa Ali

Blogger | Data Scientist | Machine Learning Engineer. For more content, visit: www.writersbyte.com. Support me on: ko-fi.com/moosaali9906