8 Types of machine learning systems you may not know where they came from

Mahmoud Ahmed
Analytics Vidhya
Published in
6 min readMay 8, 2020

Introduction

Many people classify machine learning systems into supervised learning and unsupervised learning, but the truth is that machine learning systems can be categorized into many categories and the criteria for categorizing them are not exclusive to all of the individual systems and can all be correctly labeled “Types of Machine Learning Systems”.

fig[1] Types of ML systems

Some of the factors that the systems classified based on it are:

  • Human Supervision (Supervised, Unsupervised, Semisupervised, and Reinforcement Learning).
  • Incremental Learning (Online and Batch learning).
  • Making Prediction (Instance-Based and Model-Based Learning).

Let’s look deeper into each of these criteria …

Human Supervision

fig[2] Types of ML systems based on human supervision

Machine-learning systems can be classified based on the type of supervision that they perform and fall into four major categories:

1. Supervised Learning

fig[3] Supervised learning

In supervised learning, the training data introduced into the system must include the label of each sample that was used to evaluate the training data. Much like what happens in classification problems like a spam classifier, the algorithm uses a single message with its label, to learn how to classify new messages.
And in another example with house prices prediction, the algorithm needs to label every house specification, with the label of the house which is, in this case, will be the price, to allow the system to be able to predict the new house price based on house specifications, and this task is called regression.

2. Unsupervised Learning

fig[4] unsupervised learning

In unsupervised learning, the training data fed to the system has no label describing which sample belongs to which class, and therefore it’s must rely on itself to determine patterns of samples, which reduces the accuracy of the algorithm, However, this is the most important of these categories because most of the data are unlabeled and need to find patterns in between them.
Some of the most used tasks in unsupervised learning are clustering (splitting the dataset into groups, based on similar patterns between data points), anomaly detection (discovering unusual data points in the dataset and using it, is useful for finding fraudulent data points, and finding outliers in the data preprocessing phase) and dimensionality reduction (also used in the data preprocessing phase to reduce the number of features in the dataset).

3. Semi-Supervised Learning

fig[5] semi-supervised learning

In semi-supervised learning, the mix of labeled and unlabeled data allows the model to be combined with both supervised and unsupervised techniques, allowing for the use of both supervised and unsupervised techniques in the same environment. This can happen by using labeled data in model training and then using the trained model to classify the unlabeled data and then feed all the high probability predicted data to re-train the model with a larger amount of labeled data, a technique called pseudo-labeling.

4. Reinforcement Learning

fig[6] Reinforcement learning

This system is very different from other types of learning ways, in that an agent can observe the environment and learn from it, by performing actions that can get a reward on a good action or penalty for a bad action. This strategy schema is called a policy that defines what action the agent should take in a given situation.

Incrementally Learning

Another criterion to classify machine learning systems is based on the ability to learn incrementally from a stream of coming data.

1. Batch Learning

fig[7] Batch learning

This type of system needs to train on all of the available data and use the available time and computation resources (called offline learning because it learns offline and then launches into production and runs without learning) and then when we need to train a new version of the system, we have to retrain the model from scratch away from the production version of all data (old & new) then replace the trained model with the new one.
A simple example of this type of system is to build cat vs not cat image classifier trained on cats images on the day and after some time we need to introduce the night images, then we need to take all night and day cat images and retrain our model and deploy the new model in the new release.

2. Online Learning

fig[8] Online learning

The idea behind the online learning is to train the system incrementally by feeding it data points sequentially or by a small group called mini-batches, online learning is great for systems that receive data as it continues to flow and need to adapt quickly to with new changes and don’t care too much about a long history and the best-known example of this system is the “Stocks prices prediction model” which learns on the fly to adapt to recent changes in stock prices without worrying too much about too old a reading.

But one of the big challenges in this type of system is the bad quality data fed to the system, the system’s performance will gradually decline, some examples of the bad data are wrong readings in stock prices or someone spamming a search engine to try to rank high in search results, and to reduce this risk we need to add monitoring layers like anomaly detection algorithms able to detect any abnormal data that can affect model performance and switch learning off until the quality of data improves, or using a data expectations layer which is alerted when data quality is lower than expected, I’ll devote an article to talking about it in more detail.

Making Prediction

This last factor separates machine learning systems into two types is how the model generalizes to making predictions for new data it has never before seen before.

1. Instance-Based Learning

fig[9] Instance-based learning

To be able to classify a new data point the algorithm calculates the distance between the new data point and the other two points in your dataset by 3 or more and then predicts the class in which the new data point belongs, it looks simple but if we look at its performance I don’t think you’ll be too excited, and the most well-known algorithm of this type is the K-Nearest Neighbours algorithm.

2. Model-Based Learning

fig[10] Instance-based learning

Another way to generalize from a set of examples is to build a model out of these examples, then use that model to make predictions. This is called model-based learning, and building the model here means certain equations have parameters to tune alternate instance-based learning, which depends on similarity as the main choice.

If we look at the house price prediction problem and try to fit it into the model we can find two scenarios:

  • 1st scenario: Using the new house price to find and measure the similarity between house prices and all labeled data points, we then set the new house price based on the most similar house in our data or the average of the most similar k houses, called model-based learning.
  • 2nd scenario: Building equations from features and using an algorithm to tune these parameters, can fit and generalize the house price data, then get the feature values and substitute them in the equation to get the new house prices.

Finally, the most common category is a search for machine learning systems from multiple perspectives, and to expand the horizons for matching real-life problems with more system-level types.

--

--

Mahmoud Ahmed
Analytics Vidhya

Data Engineer passionate about solving real-world problems through ML & data engineering techniques. join me on www.mahmoudahmed.dev