ML Series: Day 11 — Different Types of Learning in ML

10 min readMar 26, 2024

We previously classified machine learning methods into three categories: supervised, unsupervised and reinforcement learning. Machine learning methods can be classified into different types, and we will discuss the most important ones below.

1. Lazy learning versus eager learning

Eager learning works by building a model using training data, then this model is used to evaluate test data. If the evaluation results are appropriate, the obtained model is used to predict new data. Therefore, eager learning does most of the necessary work on the model, and these methods are called eager because they perform the generalization before encountering a new example. On the other hand, lazy learning does not build any model before receiving new data from the input and waits for unclassified data and then starts building the predictive model and is called lazy because it postpones the building of the model.

Lazy learning algorithms take a shorter time to train and a longer time to predict. Eager learning algorithms process data in the training phase and are faster than lazy learning algorithms to predict data observations.

2. Batch learning (offline) versus online learning

Batch learning refers to the training of machine learning models in batches. In other words, batch learning refers to the training of models at regular intervals such as weekly, fortnightly, monthly, quarterly, etc. In batch learning, the system does not have the ability to learn incrementally and the models must be trained each time using all the available data. In this method, data is collected and accumulated over a period of time, and then the models are trained with the accumulated data as often as possible in periodic time intervals. This training requires a lot of time and computing resources, and therefore, it is usually done offline, and for this reason, this method is also called offline learning. Models trained using batch learning are retrained only at certain intervals based on the model’s performance when faced with new data.

Figure 1. shows an overview of the batch training method.

**Figure 1**. An overview of batch or offline training

Building offline models or batch-trained models requires training models with the entire training dataset, and improving the performance of these models requires retraining with the entire training dataset. The nature of these models is static, meaning that once trained, their performance will not improve until a new model is retrained. The performance of the model slowly degrades over time, because the environment or data that the model encounters become more complex every day; while the model remains unchanged. This phenomenon is often called Model rot or data drift, and one possible solution is to regularly train the model on new data.

There are several reasons for using these methods, which can be mentioned as follows:

1. Business requirements do not require repeated learning of models.

2. The data distribution is not expected to change frequently.

3. There are no software systems and computing resources for regular training of models.

4. The expertise needed to create a gradual learning system is not available.

But in online learning, training is done gradually and when new data arrives in small groups or mini-batches. In this method, each learning step is fast and has low computational complexity, so the system can learn new information as soon as it arrives. Online learning is a good option for systems that receive data in a continuous stream (for example, stock prices) and need to adapt to rapid or independent changes, as well as when there are limited computing resources.

Figure 2 shows the overview of the online learning method.

**Figure 2**. An overview of group or offline training

Online learning algorithms are also used to train systems on huge data that cannot fit into a machine’s main memory, which is also called out-of-core learning. The way it works is that the algorithm loads a part of the data and runs a training step on that data and repeats this process until it runs on all the data. One of the disadvantages of the online training method is that if it is trained with inappropriate data, the system will perform poorly and the end user will easily see its effect, and in general, the quality of the training data is very important in this method. One of the concrete applications of this method is the Google email system, which sometimes asks the user whether the received email is spam or non-spam, and uses user feedback to train its algorithms.

3. Discriminative vs. Generative

Machine learning models can be divided into two types, discriminating and generative.

A discriminant model makes predictions on unseen data based on conditional probability and can be used for classification or regression. Conversely, a generative model focuses on the distribution of a data set to return a probability for a given instance. Discriminant model refers to a class of models used in statistical classification, mainly used for supervised machine learning. These types of models are also known as conditional models because they learn the boundaries between classes or labels in a dataset. The goal of a discriminant model in a two-class classification problem is to learn a function that maps inputs to binary outputs.

Discriminating models are not able to generate new data and it can be said that the ultimate goal of discriminating models is to separate one class from another class. If we have outliers in the data set, the discriminant models perform better compared to the generative models. In Figure 3, we see a view of discriminating models.

**Figure 3**. An overview of discriminating models

In other words, the purpose of discriminating models is to train the separating line or hyperplane between data, and the algorithms that use differentiating models can be logistic regression, support vector machines, types of neural networks, nearest neighbor, decision trees and random forest pointed out.

Generative models are a class of statistical models that can generate new data. These models are used in unsupervised machine learning as a method to perform tasks such as estimating probability and similarity between data, modeling data points, and describing a phenomenon in data. Because these models often use Bayes theorem (I’ll talk about this algorithm in the coming days) to find the joint probability, they can handle more complex tasks than discriminant models. Therefore, the generative approach focuses on the distribution of classes in a dataset and models underlying patterns or distributions of data points (e.g Gaussian distribution).

Figure 4 shows a view of the generator models.

**Figure 4**. An overview of the generator models

Unlike discriminant models that try to draw a separating line or hyperplane between the data, generative models try to learn the probability distribution of the data. Algorithms that use generative models include ‌Naïve Bayes, Bayesian networks, Markov random fields, ‌Hidden Markov Models (HMMs), Generative Adversarial Networks (GANs), and Autoregression models.

Note: Autoregressive models are a class of machine learning (ML) models that automatically predict the next component in a sequence by taking measurements from previous inputs in the sequence.

For a better understanding of these two approaches, the difference between these two methods can be summarized as follows.

· The main idea of discriminant models is to find the discriminant boundary and draw it in the data space, while generative models try to model how the data are located.

· A generative model can generate data, while a discriminative model focuses on predicting data labels.

· Suppose we have input data x and we want to classify the data using y labels. A generative model learns the joint probability distribution p(x y), and a discriminant model learns the conditional probability distribution p(y|x) or the probability of y conditional on x. Conditional probability means that we have two events and we want to calculate the probability of event y given that event x has already happened. In other words, assuming that event x has happened, how likely is it that event y will happen, which we will discuss in detail in the future.

· Discriminative models recognize existing data, i.e. can be used to classify data, while generative models can generate data.

· Generative models are often used for unsupervised learning tasks and discriminant models for supervised learning tasks.

· Generative models are more affected by outlier data than discriminant models.

· Differentiating models have lower computational costs compared to generative models.

“GAN” networks can be considered as a competition between the generative and discriminating network model, which is shown in Figure 5 as an example of the images produced by these algorithms, all the images were produced by the algorithm.

**Figure 5** is an example of images produced by the “GAN” network.

4. Instance-based learning versus model-based learning

Instance-based learning (also known as memory-based learning or lazy learning) involves memorizing training data in order to predict new data that the model has not seen. This approach does not require any prior knowledge or assumptions about the data, which makes it easy to implement and understand. However, it may be computationally expensive because all training data must be stored in memory before making a prediction.

In Instance-based learning, the algorithm remembers the training data and when making predictions, it uses a similarity measure and compares the new items with the stored data.

Figure 5 shows how the new sample is predicted based on its neighboring samples.

**Figure 5**. General form of instance-based learning algorithms

Model-based learning methods (also known as structure-based or eager learning) can generalize better than instance-based methods by building models from training data. This involves using algorithms such as linear regression, logistic regression, random forest trees, etc. to create a model that can be used to predict new data points.

Figure 6 shows how class prediction is performed based on boundaries learned from the training data rather than compared to the stored data set based on similarity measures.

**Figure 6**. General form of model-based learning algorithms

Instance-based learning and model-based learning have several key differences, which can be mentioned as follows:

Generalizability: In model-based learning, the goal is to learn a generalizable model that can be used to predict new data, meaning that the model is trained on one data set and then on another data set that the model It has not seen them before, the evaluation and performance of the model is tested. In contrast, instance-based learning algorithms simply remember training examples and use them to predict new data. This means that instance-based learning algorithms do not attempt to learn a generalizable model, and their performance on new data is not as reliable as model-based algorithms.

Scalability: Since instance-based learning algorithms simply remember the training examples, they can be very slow and, in some cases, unusable when working with large data sets because the model must store all the training examples in store the memory and compare the new data with each of the stored samples. In contrast, model-based learning algorithms can be more scalable because they do not have to store all training examples. Instead, they learn a model that can be used to make predictions without storing the training data.

Interpretability: Model-based learning algorithms often produce models that are easier to interpret than example-based learning algorithms because model-based algorithms learn a set of rules or parameters that can be used to understand how the model predicts. In contrast, example-based learning algorithms simply store training examples and use them as a basis for prediction, which can make predictions difficult to interpret.

5. Parametric versus non-parametric model

In machine learning, a parametric model makes assumptions about the probability distribution of the data and has a fixed number of parameters that do not depend on the amount of training data. Once the parameters are learned from the data, the model can be used to predict new data. On the other hand, a non-parametric model has no assumptions about the probability distribution of the data, and the model cannot be specified with a set of parameters, and the number of parameters changes with the number of training data. Non-parametric models often require more data to estimate their parameters, and their calculation speed may be slower than parametric models. Examples of parametric models include linear regression, logistic regression, and simple Bayes. Examples of non-parametric models include k-nearest neighbor, decision tree and random forest.

In general, parametric models are useful when the data are relatively simple and follow a known probability distribution, while nonparametric models are more appropriate for complex and heterogeneous data.

In Part 11, we talked about Different Types of Learning in ML. In the next days of machine learning journey, we will talk about different types of algorithms in in AI, especially in Part 12: Machine Learning Series: Day 12 — Logistic Regression (Part 1), we will investigate logistic regression, a popular statistical model used for binary classification tasks by estimating the probability of an event occurring based on input variables.