Essential guide to handle Outliers for your Logistic Regression Model
Improve the model performance by removing the outliers
A real-world dataset often contains a lot of missing values and outliers data points. The cause of the outliers can be data corruption, measurement/experimental errors, or human errors. The presence of outliers in the dataset impacts the model to great extent. Handling outliers is an essential component of the feature engineering pipeline.
There are various techniques to handle the outliers present in the dataset including the interquartile range method, box plots, z-score, and many more.
Read this article to know how to detect outliers for an unsupervised dataset using Silhouette Analysis.
Logistic Regression models are not much impacted due to the presence of outliers because the sigmoid function tapers the outliers. But the presence of extreme outliers may somehow affect the performance of the model and lowering the performance. In this article, we will discuss how to improve the performance of…