An Overview of Machine Learning Algorithms

Sushilkumar Yadav
Analytics Vidhya
Published in
12 min readJun 3, 2020

Let me ask you a question... What companies do after generating thousands of gigabytes of data? How product based companies make money by giving their product services for free?

Any answers….. let me come straight to the point

The ultimate goal is to collect data... Imagine if someone gets to know about your daily habits — what you do at what time, where you go, etc…, they could be able to predict your subsequent step/behavior too. as humans to learn new things, we rely on data (information), observations, etc. It’s been said that the more the person becomes old, the more he gains the experience and can take better decisions in life. When we are born, the brain is the only thing where learning begins by observing things. The exact replica of this in technology is machine learning and Artificial intelligence. Just like God has given us a brain, likewise, programmers in the technology era have given the algorithms to learn and analyze from the data (statistics) and observation. Today in this blog, I will be going through some machine learning algorithms. But before that let’s understand what exactly is machine learning.

Machine learning is the science of having computer systems to act without being explicitly programmed by the programmer. It is the branch of Artificial intelligence, that uses the ability to automatically learn and improve from the past data. The learning process begins with the observations or data and finding the pattern in data to make better decisions. The ultimate aim is to allow computers to learn automatically without human interventions. Let's take an example: in today's era, thousands of gigabytes of data are generated daily by the companies, all these data are in raw format, these data are then pre-processed using tools such as Hadoop, Map Reduce, Hive, Pig, Sqoop, Hbase,etc…. these pre-processed data are then used in different ways, such as generating analytics dashboard for the company and using machine learning to predict the results from that dashboard.

In the past decade, machine learning has given us self driving cars, pattern classification, and recognition, effective web search, application in medical science, etc…..

Examples that how ML is used in medical science:

1. Survival Prediction: Given a description of a patient R, predict how long R will survive — i.e. how many additional months

2. Improving cancer treatments: As there are thousands of therapies available and which therapy will be better for what patients can be predicted based on certain parameters and this will effectively help the doctors to treat the patients (supervised learning — Classification problem)

3. Finding effective querying and alerting policies

4. Intelligent diabetes management

5. Psychiatric diagnosis and treatment using patient-specific fMRI…..etc

Now, since we see that machine learning has many good points, let’s see some pain points too.

The pain point that I see is that to train deep learning models(which has more no. of hidden layers) faster, a high performing system with good GPU and CPU core must be required. The example I can give is to generate face using GAN (generative adversarial network), the minimum training time it took to us was 8 hours, since we were training on the i3 system.

Lets deep dive in Machine Learning Algorithms

  1. Classification Algorithm:

Classification uses the Supervised learning model. A supervised learning algorithm takes a known set of input data and known responses to the data(output) and trains a model to generate reasonable predictions for the response to new data. Use supervised learning if both input and output are known for example predicting a dog or cat. Supervised learning is used for the patterns assigned with a target, a value or a class to predict. For instance, let's say you want to predict the revenue of a store from different inputs, then your model will be trained on historical data and use them to forecast future revenues, hence the model is supervised. The model will learn a link between the input and output. Supervised learning uses classification and regression techniques to develop predictive models.

What is the classification? In classification, we use a training set to determine the decision boundary between the classes. Once the boundary conditions are determined, the next step is to predict the target class. This whole process is termed as classification. Use classification if your data can be tagged, categorized, or separated into specific groups or classes.

Common algorithms for performing classification include support vector machine (SVM), boosted and bagged decision trees, k-nearest neighbor, Naïve Bayes, discriminant analysis, logistic regression, and neural networks.

Example of classification Algorithm :

Classification Algorithm

Types of Classification Algorithm: Linear Classifiers (Logistic Regression, Naive Bayes Classifier, Linear Discriminant Analysis), Support Vector Machines (SVM), Quadratic classifiers, K nearest neighbor (KNN), Decision trees (Random Forest), Neural Network (HebbNet, Perceptron, Adaline, BP), Learning Vector Quantization

The following are the techniques to evaluate a classifier: Cross-Validation, Precision & Recall, ROC Curve. Precision and recall can be calculated using the confusion matrix.

Confusion Matrix

Accuracy = TP+TN/TP+FP+FN+TN

Recall = TP/TP+FN

F1 Score = 2*(Recall * Precision) / (Recall + Precision)

We generated the following confusion matrix in one of our deep learning projects

Generated Confusion Matrix and calculated precision and recall

2. Artificial Neural Network

Artificial Neural Network has evolved from Biological Neural Network. it sees the analogy from how the human brain processes the signal.

if structurally we compare BNN with ANN, then we find the similarity as

BNN vs ANN

Dendrite — Input; Soma — Neuron; Axon — Output

there are differences as well such as complexity, memory management, etc. ANN has the following architectures:

FeedForward Network, Recurrent Network, Associative Network

Artificial Neural Network is classified based on 3 parameters:

  1. According to signal flow direction: Example- MLP, LSTM
  2. According to learning mechanism: Example- Error based learning(Back Propagation), Hebbian or associative learning (Associative Memory, Hopfield Network, etc), competitive learning(KSOFM- Kohonen self-organizing feature map, LVQ-Learning Vector Quantization)
  3. According to structural expansion: Example- Perceptron(Adaline vs Madaline)

the most popular ANN is Perceptron, Multi-layer perceptron, Backpropagation, Hopfield, Bidirectional Associative Memory(BAM), Kohonen Self-Organizing Map, LVQ, Radial Basis Function(RDF),etc.

3. Clustering Algorithm

The clustering algorithm also is known as the Unsupervised learning method, as in this case, we draw references from the dataset consisting of input data without labeled response.

K Means Cluster

In the above example, we can see the center of each cluster in red, which is the mean of all the observations that belong to that cluster. as we see, the data that belong to a given cluster are closer to the center of that cluster, in comparison to the center of other clusters.

Types of Clustering Algorithms: K-Means Clustering (Centroid based clustering), Mean shift clustering, Kohonen self-organizing maps, Density-based clustering, Distribution based clustering, Hierarchical based clustering, etc

4. Reinforcement Learning

Reinforcement learning is the training of machine learning models to make a sequence of decisions. in this model, artificial intelligence faces a game-like situation, where the computer applies a trial and error to come with a solution to the problem. Every action it performs, artificial intelligence gets either the rewards or penalties. Although the programmer sets the rewards policies, there is one rule that the programmer gives no hints or suggestions for how to solve the game. It's all up to the model to find an optimum way to maximize the rewards, starting from totally random trials and finishing it with sophisticated tactics and superhuman skills.

As per Yann LeCun, the head of research at Facebook, Reinforcement learning is the cherry on the great AI cake with machine learning the cake itself and deep learning the icing, without previous iterations, the cherry would top nothing.

Illustration of Reinforcement Learning

Hierarchy of Reinforcement Learning:

Reinforcement Learning Algorithms

5. Regression Techniques

This model estimates the relationships between the variables. In simple words, from the list of given input variables or features, it estimates the continuous dependent variables. Typical applications include survival prediction, weather forecasting, etc. Use regression techniques if the data range and nature of the response is real numbers. Regression activation function can be linear, quadratic, polynomial, non-linear, etc. In the training phase, the hidden parameters are optimized w.r.t. the input values presented in the training. The process that does the optimization is the gradient descent algorithm or also known as the steepest descent algorithm. The gradient descent is used to update the parameter of the model. If the learning rate would be more, it will lead to overshooting, if the learning rate is too small it would take a larger time to converge, If you are using neural networks, then you also need a Back-propagation algorithm to compute the gradient at each layer. Once the theoretical parameters/hypothesis parameters got trained (when they gave the least error during the training), then the same theory/hypothesis with the trained parameters is used with new input values to predict outcomes that will be again real values.

Example of activation function and learning rate :

Learning Rate Comparison

types of regression algorithm: Linear Regression, Logistic Regression, Polynomial Regression, Stepwise Regression, Quantile Regression, Ridge Regression, Lasso Regression, Elastic Net Regression, Principal Component Regression (PCR), Partial Least Squares (PLS) Regression, Support Vector Regression, Ordinal Regression, Poisson Regression, Negative Binomial Regression, Quasi Poisson Regression, Cox Regression, Tobit Regression, Adaptive Neuro-Fuzzy Learning, etc

6. Tree-Based Algorithms

Tree-based machine learning is the most commonly used supervised learning method. Using different features from a dataset at each node, we recursively split the training sample. Make note over here is that the data that is split from the node must be effective. The split data will be effective based on the learning simple decision rules inferred from the training data. The name tree-based algorithm is because it classifies the example by sorting them down on the tree from root to leaf node with leaf note represents the classification to the example. Each node in the tree acts as a test case for some attributes which the trunks descending from the node represent the possible answers to the test case. Tree-based Algorithms are constructed using branches and nodes

The classic example of tree-based algorithms is planning of playing badminton on a particular day. First deciding the weather its sunny, cloudy or rainy and after deciding this other parameter comes which will help us decide whether we will be able to play badminton or not.

illustration of Tree-Based Algorithms

types of the decision tree: Classification Trees, Regression Trees, etc…

7. Probabilistic Model

This is the alternative view of machine learning algorithms that generalizes pretty much all the ML Algorithms. In this framework, we will be explicitly thinking of learning as a problem of statistical inference. The best example I can give of probabilistic modeling is Naïve Bayes models. Suppose we have a task to predict whether a movie review is positive or negative(label) based on what words (feature) appear in the review.

Thus the probability for a single data point can be written as

pθ (y/x) = pθ (y| x1, x2, x3,……, xD) — — — (1)

here y is the label and x is the features

for lot variables, using chain rule to the above equation,

pθ (x1, x2, x3,……, xD, y) = pθ (y) pθ (x1 | y) pθ (x2|y,x1) ) pθ (x3|y,x1, x2)

…… pθ (xD|y, x1, x2,….., xD-1)

= pθ (y) ᴨpθ (xD|y, x1, x2,….., xD-1)

Using naïve bayes assumption

Eq(1) becomes,

pθ (y/x) = pθ (y) ᴨpθ (xd|y)

…… I hope this gives small idea where probabilistic model is used.

Most of the complex algorithms are build using probabilistic modeling

types of probabilistic modeling: Generative probabilistic modeling, Conditional probabilistic modeling, etc…

8. Generative Models

Generative modeling is an unsupervised learning type, it describes the features in data, enabling the computers to understand the real world. This algorithm processes volumes of data and makes reductions about the data into its digital essence. One of the types of generative models named Variational Autoencoder (VAE) is widely used for dimensionality reduction, another type named Generative Adversarial Network (GAN) is used to generate new photos of a particular object that looks like real objects.

Let me demonstrate, Using VAE, see the original picture and regenerated picture

Example of Normal and VAE Images

Using GAN, see the regenerated picture

LFW Database Samples used for GAN Database Generation
GAN Generated Database Samples

3 most popular types of Generative models today are:

Widely Used Generative Model Types

9. Deep Learning Models

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. The only difference between machine learning and deep learning is that machine learning has only one hidden layer while deep learning has more than one neural network as seen in the diagram below. Basically deep l, earning is used to classify the high dimensional data. There are multiple layers in deep learning to process features, and generally, each layer extracts some piece of valuable information. It is networks that have an input layer, an output layer, and at least one hidden layer in between. Each layer performs specific types of sorting and ordering in a process that some refer to as “feature hierarchy”. One of the key uses of these sophisticated neural networks is dealing with unlabeled or unstructured data. The phrase “deep learning” is also used to describe these deep neural networks, as deep learning represents a specific form of machine learning where technologies using aspects of artificial intelligence seek to classify and order information in ways that go beyond simple input/output protocols.

Architectures of Deep Learning

1. Unsupervised Pretrained Networks (UPNs)

2. Convolutional Neural Networks (CNNs)

3. Recurrent Neural Network

4. Recursive Neural Network

Out of all these architectures we have implemented a Convolutional Neural Network for our face recognition model because Convolutional Neural Networks allows us to extract a wide range of features from images.

CNN Architecture

10. Lazy Algorithms

Lazy algorithms are simple algorithms in which the generalization takes place once the query is shooted to the model. A key issue of this method is to weight the examples in relation to their distance to the query instance in such a way that the closest examples have the highest weight. KNN is one example of lazy learning algorithms.

The lazy algorithm can be illustrated from the below diagram,

KNN

11. Ensemble Methods

When several machine learning algorithms are allowed to achieve better results, it is known as Ensemble Methods. Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance, bias and improve predictions.

The best example I can give is the face recognition project that we had developed, Using different training and test sample, the following are the result that we achieved. Training samples are taken in increasing order (from less training sample to more training samples)

Accuracy using CNN and combination of VAE & CNN

Types of Ensemble Algorithms: Adaboost, Bagging, weighted average, etc

12. Apriori Algorithm

Apriori algorithm is used to find the association between the data. Association rules can be thought of as an if-then (Action-Conclusion) relationship, A typical if-then rule is used to determine whether an antecedent (cause or action) infers a consequent (effect or reaction). Suppose we have a rule of the form IF A, THEN B, where A is a set defined on universe X, and B is a set defined on universe Y. Classic example I can give is supposed a customer A buys some item say book, so what is the chance that same customer A will buy the stationery item under the same transaction ID. Many fuzzy association algorithms rely on Apriori Algorithm.

Hope This article gave you the overview of machine learning Algorithms

--

--

Sushilkumar Yadav
Analytics Vidhya

SDE@Jio Platforms ltd | Research and Development | Data Science and AI Enthusiast | API developer using Java and Python | WSO2 developer