The 5 ML algorithms you NEED to know!

Published in

Analytics Vidhya

5 min readFeb 19, 2021

5 algorithms you need to know, or maybe, more accurately 5 of the most common machine learning algorithms used today!

This is only a brief run through of the algorithms, for a more in-detail look at each i would recommend google scholar and to look at some well respected research papers!

Linear Regression

The line of best fit, an absolute must have for any prediction or correlation exercises. Linear regression is a technique used to model the relationships between observed variables. As just stated linear regression is used for two main applications:

Correlation: Obviously some applications fit the data better than others. Linear regression can be used to analyse correlations between variables and to refine statistical models to incorporate further inputs. This type of application is common in scientific tests, e.g. of the effects of a new medicine on the patients in a test study.
Predictions: After a series of observations of variables, regression analysis gives a statistical model for the relationship between the variables. This model can be used to generate predictions: given two variables xx and y, the model can predict values of y given future observations of x. This idea is used to predict variables in countless situations, e.g. the outcome of political elections, the behaviour of the stock market, or the performance of a professional athlete.

Logistic Regression

Logistic regression is similar to linear regression but is used to model the probability of a discrete number of outcomes, typically two. Although it sounds much harder than linear regression, there is actually only 1 more step involved.

First, you calculate a score using an equation similar to the equation for the line of best fit for linear regression

The extra step is feeding the score that you previously calculated in the sigmoid function below so that you get a probability in return. This probability can then be converted to a binary output, either 1 or 0.

To find the weights of the initial equation to calculate the score, methods like gradient descent or maximum likelihood are used. Since it’s beyond the scope of this article, I won’t go into much more detail, but now you know how it works!

Random Forest

Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. It is a supervised learning algorithm

One big advantage of random forest is that it can be used for both classification and regression problems, which form the majority of current machine learning systems.

Random forest has basically the same parameters as many other bagging classifiers. However there is no need to combine a decision tree with a bagging classifier because you can easily use the ‘classifier-class’ of random forest. With random forest, you can also deal with regression tasks by using the algorithm’s regressor.

Random forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model. Therefore, in random forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node.

Support Vector Machines

SVM, or support vector machine, is a supervised Machine Learning algorithm that is used in many classifications and regression problems. It’s still one of the most used robust prediction methods that can be applied to many cases that involve classifications.

Support vector machine works by finding an optimal separation line called a ‘hyperplane’ to accurately separate 2 or more different classes in a classification problem. The goal is to find the optimal hyperplane separation through training the linearly separable data with the SVM algorithm.

More formally, the algorithm (SVM) creates a hyperplane that is of higher dimensional space (if not linearly separable) which will aid in the classification, outlier detection, regression and so on. A good separation of classes is achieved by having a hyperplane that has the largest distance to the nearest training data points.

XGBoost

XGBoost is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm, which attempts to accurately predict a target variable by combining the estimates of a set of simpler, weaker models.

When using gradient boosting for regression, the weak learners are regression trees, and each regression tree maps an input data point to one of its leafs that contains a continuous score. XGBoost minimises a regularised (L1 and L2) objective function that combines a convex loss function (based on the difference between the predicted and target outputs) and a penalty term for model complexity (in other words, the regression tree functions). The training proceeds iteratively, adding new trees that predict the residuals or errors of prior trees that are then combined with previous trees to make the final prediction. It’s called gradient boosting because it uses a gradient descent algorithm to minimise the loss when adding new models.

Below is a brief illustration on how gradient tree boosting works. (taken from aws)

Do you agree that these are the top 5 ML algorithms? Let me know of any others…