Machine Learning
Types of ML Models
We all have heard the term ‘Machine Learning’ and some might even be familiar with the details of how it works. Recently, I began studying about the various applications of Machine Learning, and I came across a couple of articles on the different types of ML Models that have been developed over the years, classified on the basis of the task they accomplish. In this article, I compile a list of what I have learnt in a comprehensive and crisp manner!
Let us first have an overview of the types of models we have, grouped on the basis of the tasks they perform:
1. Classification
2. Regression
3. Clustering
4. Dimensionality Reduction
5. Deep Learning
We may also make note that this is not an exhaustive list of ML Model types, just a compilation of the most common types that I have read about as of now.
1. Classification
A Machine Learning model whose output is always a categorical variable. These models are used whenever we have a set of labelled data and we need to group each datapoint into a certain ‘class’ or group of object with similar properties.
One of the most common application of these models is the Image Classifier, to classify images of animals into their groups, like cats, dogs, butterflies and so on.
Some common ML Algorithms which serve as classifier models are:
K-Nearest Neighbours
- Simple enough to be understood by beginners
- Computationally expensive
Naive Bayes
- Based on Bayes’ Theorem
- Updates probabilities based on gained information
- Makes certain idealistic assumptions for these probabilities, thus titled ‘Naive’
- Performs well on real world data, despite the assumptions
Logistic Regression
- Named regression, but widely used in Binary Classification
- Linear model for classification
- Can be used for more than 2 classes as well, but becomes computationally expensive as classes increase
Support Vector Machine (SVM)
- Used for binary or multi-class classification
- Searches for a curve or hyperplane that divides data in the best way
Decision Tree
- Used for binary or multiclass classification
- Powerful against outliers
- Overfitting may occur
Ensembles
- A combination of two or more of the above stated classifiers, to get the desired result
2. Regression
A Machine Learning model whose output can take up continuous values. These models are used when we need to establish a relationship between a value we want to predict and other values on which it may depend.
An example of application of regression would be the prediction of airplane ticket prices based on seasonal trends, or prediction of temperature on a certain day of the year.
Some common ML Algorithms which serve as regression models are:
Linear Regression
- The simplest of regression models
- Works best on linearly separable data
- Issues may arise when multi-collinearity is present in the dataset
Lasso Regression
- Linear Regression with the L1 Regularisation
- Reduces the number of predictor variables
- Robust against outliers
Ridge Regression
- Linear Regression with the L2 Regularisation
- Does not reduce variables, instead keeps them all and adjusts their importance in the final outcome
- Works best when output variable is a function of all input variables
SVM Regression or SVR
- Similar to the SVM
- Objective is to find the best-fit line/curve
- This is the hyperplane that contains the maximum points
Decision Tree Regression
- Tree structure like Decision Tree classifier
- Useful when predictions can have virtually infinite values
3. Clustering
A Machine Learning algorithm that groups together unlabelled data and labels them without manual intervention, based on a certain measure of similarity.
A common application would be grouping together customers of a rock climbing club, based on their age, fitness and athleticism to give them the correct kind of course to practice on.
Some common ML Algorithms which serve as clustering models are:
K-Means
- Simple enough for beginners to understand
- Suffers from high variance
- ‘K’ value must either be pre-determined or calculated, which can be computationally expensice
K-Means++
- Improved version of K-Means, which selects initial centroids in a smarter manner
- Although initialisation is a little longer than K-Means, it serves well to reduce time consumed later into the process
K-Medioids
- K-Means gives centroids that may not actually be a part of the dataset, reducing interpretability
- K-Medioids initialises like K-Means++, processes like K-Means, but finally chooses an actual data point as the centroid of groups, based on which point gives the least loss with respect to the original centroid
Agglomerative Clustering
- Hierarchical clustering, bottom up approach
- Begins with all datapoints in individual clusters, ends with one big cluster containing all datapoints, grouped on similarity measures
- One of the penultimate states of cluster sets is chosen as the output set
DBSCAN
- Groups together points that are close to each other, usually measured by Euclidean distance
- Basically has 2 parameters:
- Maximum distance between 2 points to consider them close
- Minimum number of points to be labelled close to each other, to be called a high-density group - The optimum values of these parameters usually depend on the size of the dataset
4. Dimensionality Reduction
Dimensionality is the number of predictor variables on which the target variable depends.
Often, in real world cases, we have access to a lot of predictor variables, but only a handful of them actually considerably affect the target variable, while others carry very little impact. Such subtle variables may be removed from the equation completely, in order to reduce computation costs.
This is known as dimensionality reduction and can result in reduction of model complexity, burden on the processor, and increase in computational efficiency; producing similar results, sometimes even better than original.
Some common ML Algorithms which serve as classifier models are:
Principle Component Analysis (PCA)
- Creates a new set of lesser number of predictor variables, out of the original set
- Results become less interpretable
T-distributed Stochastic Neighbour Embedding (TSNE)
- Calculates the measure of similarity of small sets of pairs of points, unlike maximizing variance as in PCA
- A visual difference can be observed on the Swiss Roll Dataset, when processed with TSNE and with PCA
- First a point is chosen
- The gaussian distance distribution for this point is calculated
- The size of the gaussian circle varies with perplexity or the number of gaussian neighbours in vicinity
- We replace the gaussian distribution with a Cauchy Student-t distribution, with a sharper peak and heavy tails
- Heavy tails can be used to find probabilities of similarity of far-away points, and keep only highly similar points (high probability) in the same cluster, in a lower dimension
- Basically, it works like an M-Dimensional projection of an N-Dimensional dataset, where M < N. Often, M is chosen as 2 or 3 for visualisation purposes.
Singular Value Decomposition (SVD)
- Decomposes a large matrix into smaller, calculable component matrices based on linear algebra
- Uses properties of linear transformations to produce results efficiently
5. Deep Learning
The subset of Machine Learning that deals with Neural Networks is known as deep learning.
It has taken up the internet by storm, as it is being used widely in numerous ways in digital marketing, application personalisation, recommendation systems and many other facets of the internet. Often, we see image perception using neural networks and real-time object labelling, both of which are products of deep learning.
Some common ML Algorithms which serve as deep learning models are:
- Multi-layer perceptron
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Boltzmann Machine
- Autoencoders
- Generative Adversarial Networks (GAN)
Conclusion
We have understood a few fundamental and differentiating features of a small set of machine learning models as well as algorithms which fall under those categories. Once I learn more about these models, I shall expand upon this data in a new article in continuation to this one.
If you liked the article so far, you might want to follow me on Medium, here.
If you would like to get in touch, you may do so here.