Feature Selection by Lasso and Ridge Regression-Python Code Examples

Sabarirajan Kumarappan
5 min readAug 16, 2020

--

Machine Learning is not only about algorithms. If you get a chance to review the blogs & the case studies, you would be able to understand that the winners were using the same algorithms which a lot of other people were using.

Machine Learning is a process by which we model a machine to understand the data and make the decision based on the data.

As you all know that, Supervised ML method deals with the labelled data & make the prediction or classification based pre-defined classification observed in the input & the target feature.

An unsupervised ML method deals with unlabeled data. The model will infer patterns from a data set without any reference. This Method is suitable when we do not know what the outcomes should be or in other words we do not have data on desired outcomes.

One cannot input the data directly into the ML algorithms. The data has to be pre-processed. Feature selection and data pre-processing are most important steps to be followed. data preparation is not just about meeting the expectations of modelling algorithms; it is required to best expose the underlying structure of the problem.

I am not going deeper into the ML methods & algorithms , but whatever may be the decision output we expect — classification, prediction ,pattern recognition .The accuracy of the decision output is entirely depends on the features you use and the range & unit of the observations .

Here in this article i will explain one of the feature selection technique which i have used during my practice sessions.

Read on!

What is Lasso Regression ?

Lasso stands for Least Absolute Shrinkage and Selection Operator.It is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean.

Why regularization?

Regularization is intended to tackle the problem of over fitting. Over fitting becomes a clear menace when there is a large data set with thousands of features and records.

Why Lasso or Ridge ?

Ridge regression and Lasso regression are two popular techniques that make use of regularization for predicting.

Both the techniques work by penalizing the magnitude of coefficients of features along with minimizing the error between predictions and actual values or records.

The key difference however, between Ridge and Lasso regression is that Lasso Regression has the ability to nullify the impact of an irrelevant feature in the data, meaning that it can reduce the coefficient of a feature to zero thus completely eliminating it and hence is better at reducing the variance when the data consists of many insignificant features. Ridge regression, however, can not reduce the coefficients to absolute zero.

Ridge regression performs better when the data consists of features which are sure to be more relevant and useful.

mathematically, Lasso is = Residual Sum of Squares + λ * (Sum of the absolute value of the magnitude of coefficients).

Where,

  • λ denotes the amount of shrinkage
  • λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares are considered to build a predictive model
  • λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more features
  • The bias increases with increase in λ
  • variance increases with decrease in λ

Lets see how it works in python!!

The data used in this model is German credit card data. you can download the data from the below URL link

click here

Importing libraries

#import required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso, LogisticRegression
from sklearn.feature_selection import SelectFromModel

Import data set into the directory & Selecting the Numerical attributes

# Define the headers since the data does not have any
headers = [“over_draft”, “credit_usage”, “credit_history”, “purpose”,
“current_balance”, “Average_Credit_Balance”, “employment”, “location”,
“personal_status”, “other_parties”, “residence_since”, “property_magnitude”, “cc_age”, “other_payment_plans”, “housing”,
“existing_credits”, “job”, “num_dependents”, “own_telephone”, “foreign_worker”, “target” ]
#import dataset into the directory
data = pd.read_csv(‘germandata.csv’, header=None, names=headers, na_values=”?” )

numerics = [‘int16’,’int32',’int64',’float16',’float32',’float64']
numerical_vars = list(data.select_dtypes(include=numerics).columns)
data = data[numerical_vars]
data.shape

x = pd.DataFrame(data.drop(labels=[‘target’], axis=1))
y= pd.DataFrame(data[‘target’])

Scaling and Splitting the data set

from sklearn.preprocessing import MinMaxScaler
Min_Max = MinMaxScaler()
X = Min_Max.fit_transform(x)
Y= Min_Max.fit_transform(y)

# Split the data into 40% test and 60% training
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)

X_train.shape, X_test.shape

Selecting features using Lasso regularisation using SelectFromModel

sel_ = SelectFromModel(LogisticRegression(C=1, penalty=’l1', solver=’liblinear’))
sel_.fit(X_train, np.ravel(Y_train,order=’C’))
sel_.get_support()
X_train = pd.DataFrame(X_train)

we will do the model fitting and feature selection, altogether in one line of code. we use Lasso (L1) penalty for feature selection and we use the sklearn.SelectFromModel to select the features with non-zero coefficients

To See Selected set of features

selected_feat = X_train.columns[(sel_.get_support())]
print(‘total features: {}’.format((X_train.shape[1])))
print(‘selected features: {}’.format(len(selected_feat)))
print(‘features with coefficients shrank to zero: {}’.format(
np.sum(sel_.estimator_.coef_ == 0)))

Make a list of with the selected features

removed_feats = X_train.columns[(sel_.estimator_.coef_ == 0).ravel().tolist()]
removed_feats

X_train_selected = sel_.transform(X_train)
X_test_selected = sel_.transform(X_test)
X_train_selected.shape, X_test_selected.shape

To Check the Accuracy of the model we use Random Forest classifier to predict the results

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Create a random forest classifier
clf = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
# Train the classifier
clf.fit(X_train_selected,np.ravel(Y_train,order=’C’))
# Apply The Full Featured Classifier To The Test Data
y_pred = clf.predict(X_test_selected)
# View The Accuracy Of Our Selected Feature Model
accuracy_score(Y_test, y_pred)

Summary

we have used Lasso regularisation to remove non-important features from the dataset. This method gives maximum benefit when you have more number input features . The Penalty function C is they key factor which decides the number of eliminations. If you want to use Ridge regularization pick penalty=’l2'.

Thanks !!

--

--