When we start to learn any new machine learning algorithm, it always helps to know, the plus and minus of its predecessors, in order to strongly understand the new algorithm, so I’ve created a pros and cons list of common machine learning models.
Linear Regression
Pros
- Simple method
- Good interpretation
- Easy to implement
Cons
- Assumes linear relationship between dependent and independent variables, which is incorrect in most cases
- Sensitive to outliers
- If the number of observations are less, it leads to over fitting, it starts considering noise.
Ridge Regression
Pros
- Trades variance for bias (i.e. in presence of co-linearity, it is worth to have biased results, in order to lower the variance.)
- Prevents over fitting
Cons
- Increases bias
- Need to select perfect alpha (hyper parameter)
- Model interpret-ability is low
LASSO Regression
Pros
- Select features, by shrinking co-efficient towards zero.
- Avoids over fitting
Cons
- Selected features will be highly biased.
- For n<<p (n-number of data points, p-number of features), LASSO selects at most n features.
- LASSO will select only one feature from a group of correlated features, the selection is arbitrary in nature.
- For different boot strapped data, the feature selected can be very different.
- Prediction performance is worse than Ridge regression.
Elastic Net Regression
Pros
- Doesn’t have the problem of selecting more than n predictors when n<<p, whereas LASSO saturates when n<<p.
Cons
- Computationally more expensive than LASSO or Ridge.
Logistic Regression
Pros
- Doesn’t assume linear relationship between independent and dependent variables.
- Dependent variables does not need to be normally distributed.
- No homogeneity of variance assumption required.
- Effective interpretation of results.
Cons
- Requires more data to achieve stability.
- Effective mostly on linearly separable.
Decision tree
Pros
- Does not require standardization and normalization.
- Easy to implement
- Less data preparation work
- Missing values has no impact
Cons
- Doesn’t work for smooth boundaries
- Doesn’t work when variables are uncorrelated
- Due to greedy strategy, it has high variance
- Higher time to train the model
- Can become complex
KN Neighbors
Pros
- No training period
- Easy to implement
- New data can be added seamlessly, which will not impact the accuracy of the algorithm
Cons
- Does not work well with high dimensions
- Sensitive to noisy data, missing values and outliers
- Does not work well with large data sets, as the cost of calculating distance is huge
- Need feature scaling
Random Forest
Pros
- One third of data is not used for training, hence it can be used for testing.
- High performance and accurate
- Provides feature importance estimate
- Can automatically handle missing values
- No feature scaling is required
Cons
- Less interpret-ability, black box approach
- Can over fit the data.
- Requires more computational resources
- Prediction time is high
We cannot discriminate against machine learning models, based on pros and cons. Selection of machine learning model, is based on the business use case, that we choose to solve, No free lunch theorem. This comparison will give you some idea about the reasons for using different models for our data set.
Here, I’ve haven’t said anything about Boosting methods. We have various boosting methods such as Gradient boosting for reducing both bias and variance, Extreme Gradient boosting for utilizing your GPU to the fullest, Light GBM, Cat Boosting for handling categorical variables and preventing over fit. And recently, Stanford university released new algorithm called Natural Gradient boosting.
In the next story, we can compare various booting algorithms.