In this article, I will explain the key differences between regression and classification supervised machine learning algorithms. It is important to understand the differences before an appropriate machine learning algorithm can be chosen.
Please read Disclaimer.
I will briefly describe 7 key areas:
- Difference between regression and classification
- Names of common regression and classification algorithms
- Checking goodness of your alogrithm
- Explaination of overfitting
- Methods to avoid overfitting
- Outline of Regularization
- Mention of gradient descend
1. What are the key differences between regression and classification?
- Supervised learning algorithms
- Use historical data to forecast and make decisions
- Focus on fitting best fit line
Supervised learning algorithms require data to be labelled. For more information on supervised machine learning, have a look at my article: Machine Learning In 8 Minutes
Regression requires your data points to have continuous values. First the factors (independent variables) are found. Then coefficients (multipliers) to independent variables are calculated that minimise differences between actual and predicted values. Finally a formula is computed. The formula is used to forecast dependent variable (what you want to measure) from independent variables (what you think your target measure is dependent on). The forecasted values are continuous. Regression gives you continuous results.
Classification requires your data points to have discrete values e.g. categories. First historic data is assigned into categories (classes). Then new input data is categorised based on historic data and finally decisions are made. Forecasted values are discrete. Classification produces discrete values and dataset to strict categories.
- Use classification if you want results to reflect the class of data points in your dataset to certain explicit categories, for example if you wanted to know whether a name was male or female.
- Use regression if you want to better distinguish between individual points, for example how correlated humidity and pollution is.
2. Common regression and classification algorithms
3 well-known algorithms are.
Regression: Linear regression, Regression Forest, Regression Neural Networks.
Classification: K Nearest Neighbour, Logistic Regression, Support Vector Machines
Detailed comparison of algorithms is outlined here: Machine Learning Algorithms Comparison
3. How good is my regression or classification model?
There are various measures to check how accurate your model is :
Adjusted R-Squared (Regression): Calculates difference between actual and predicted values after penalising for degree of freedom in the equation. I have explained how it is calculated in my article: How Good Is My Predictive Model — Regression Analysis
F1 (Classification): The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model. The results are between 1 and 0. Results tending to 1 are considered the best whereas those tending towards 0 are treated as the worst. F1 is used in classification tests where true negatives do not matter as much.
Confusion Matrix (Classification): In simple terms, confusion matrix is a result table that summarises results of classification algorithm when actual true values are known. There are several terms used:
- True Positive: When the actual result is true and predicted value is also true
- True Negative: When the actual result is false and predicted value is also false
- False Positive: When the actual result is false but the predicted value is true
- False Negative: When the actual result is true but the predicted value is false
4. What is overfitting?
Overfitting is when model expressiveness is way too high. Overfitting is a condition when your model fits training data perfectly but when you test your model against test data then it performs bad. When you are training your model on training data and it builds its rules and patterns around the training data such that it is unable to generalise on unseen data. It happens because of noise (randomness) in data. As a consequence, model is unable to forecast scenarios that it has not experienced before. This model ends up accommodating stochastic behaviour in training input data and cannot generalise well. This is known as overfitting.
Overfitting is when model is bad at generalisation. Overfitting is a common issue of machine learning algorithms. This happens because training data contains noise and the model has managed to take noise into its algorithm.
To further explain, to prepare forecasting model, you need to gather training and test data. If your training data contains randomness then the model you will produce will potentially assume that those are real values, it will build equations that will produce predicted values as close as possible to actual values. However as soon as more test data is fed in, predictibility of the model fails. It ends up providing inaccurate generalisation as it will carry the noise with it.
On the other hand, underfitting is opposite of overfitting. If a model is underfitting then it doesn’t understand data well enough and cannot forecast values.
5. Avoiding Overfitting
There are several methods to avoid overfitting:
1. Increase size of your training and test data.
2. Reduce number of variables, degrees of freedom and parameters of your model. This will ensure your model is simple and will end up reducing noise (stochastic behaviour) in the training data.
3. Use cross validation technique. It compares average of the generalization error of the model with the previous average. Cross validation technique includes k-folds.
4. Penalise model parameters if they’re likely to cause overfitting. This process is known as regularization.
6. What does regularization mean?
One of the ways to reduce overfitting is by regularization. Extra terms can be introduced in the model to penalise overfitting. LASSO (L1) and Ridge (L2) are well-known regularization techniques. L1 and L2 are two loss functions that penalize by the size / square of the size of coefficients.
- L1 minimises sum of the absolute differences between estimated and actual values.
- L2 minimises sum of the squared differences between estimated and actual values.
L1 is robust but L2 is considered stable.
7. What is gradient descend?
Gradient descend is an optimization algorithm. It aims to find points of a function that minimise its errors. Gradient descend is used in nearly all of the machine learning algorithms. When a machine learning algorithm forecasts data, we can find its cost function to estimate how good the algorithm is. Cost function monitors prediction errors in a machine learning algorithm. Predictive power of a machine learning algorithm can be improved by altering its parameters. We can iteratively enhance the parameters until the cost function is at its lowest point implying that the accuracy of the model is at its maximum. This process is known as gradient descend.
There are several variations of the algorithm including stochastic gradient descend. Stochastic Gradient Descent (SGD) is used to train neural networks.
This article explained the key differences between regression and classification supervised machine learning algorithms.