A Complete Guide to Supervised, Unsupervised, and Reinforcement Learning: Understanding the Mathematics behind Machine Learning Algorithms

Nikhil Malkari
7 min readJun 22, 2023

--

INTRODUCTION

Machine learning is an exciting field of artificial intelligence that revolves around developing algorithms and models capable of learning from data to make predictions or decisions. Within machine learning, there are three fundamental paradigms: supervised learning, unsupervised learning, and reinforcement learning. In this article, we will explore these paradigms in detail, examining their underlying principles, real-world applications, and the mathematical foundations that support them.

Supervised Learning:

Supervised learning is a type of machine learning where the algorithm learns from labeled examples to make predictions or classify new, unseen data points. The labeled examples consist of input features (X) and their corresponding target labels (Y). The goal is to train a model that can accurately map the input features to the target labels and generalize well to new, unseen data.

Mathematics: In supervised learning, the mathematical representation of the model varies depending on the specific algorithm used. Let’s take two common examples: linear regression for predicting continuous values and logistic regression for binary classification.

Linear Regression: Linear regression aims to find the best-fitting linear relationship between the input features (X) and the continuous target variable (Y). The model can be represented as:

Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ,

where Y is the predicted value, β₀, β₁, β₂, …, βₙ are the coefficients to be learned, and X₁, X₂, …, Xₙ are the input features. The objective is to minimize the difference between the predicted values and the actual labels by optimizing these coefficients. This is typically achieved through methods like ordinary least squares or gradient descent.

Logistic Regression: Logistic regression is used for binary classification problems, where the target variable (Y) has two possible classes, such as “spam” or “not spam.” The logistic regression model uses the logistic function (sigmoid) to estimate the probability of an input belonging to a particular class. The model can be represented as:

P(Y=1|X) = 1 / (1 + exp(-z)),

where P(Y=1|X) represents the probability of the positive class, X represents the input features, and z is the linear combination of the input features and their corresponding coefficients. The coefficients are learned through techniques like maximum likelihood estimation or gradient descent.

Algorithms and Techniques: Supervised learning encompasses a wide range of algorithms, each with its own mathematical principles and optimization techniques. Some commonly used algorithms are:

Linear Regression: Used for predicting continuous values, linear regression finds the best linear relationship between the input features and the target variable.

Logistic Regression: Used for binary classification, logistic regression estimates the probability of an input belonging to a particular class.

Decision Trees: Decision trees partition the feature space based on certain rules and make predictions based on the majority class in each partition.

Support Vector Machines (SVM): SVMs find the best decision boundary that separates different classes with maximum margin in the feature space.

Neural Networks: Neural networks consist of interconnected nodes (neurons) organized in layers. They can handle complex relationships and are commonly used in deep learning.

These algorithms employ various mathematical principles and optimization techniques to learn from the labeled examples and make predictions or classifications on unseen data. The choice of algorithm depends on the nature of the problem, the available data, and the desired outcome.

In summary, supervised learning involves training a model using labeled examples, where the mathematics behind the models varies depending on the algorithm used. The objective is to find the best-fitting relationship between the input features and the target labels, enabling accurate predictions or classifications on new, unseen data.

Unsupervised Learning:

Unsupervised learning is a branch of machine learning that deals with unlabeled data, meaning the data lacks explicit target labels or class information. In unsupervised learning, the algorithm explores the inherent patterns, structures, or relationships within the data to gain insights and make meaningful interpretations. Unlike supervised learning, where the algorithm is provided with labeled examples to learn from, unsupervised learning relies solely on the data itself to discover patterns and uncover hidden information.

Mathematics: Unsupervised learning algorithms utilize mathematical techniques to model the underlying distribution or structure of the data. These techniques allow the algorithms to identify similarities or dissimilarities between data points and group them accordingly. One common approach in unsupervised learning is clustering, which aims to partition the data into groups or clusters based on their similarity. The algorithm seeks to minimize an objective function that quantifies the dissimilarity between data points within the same cluster and maximizes the dissimilarity between different clusters. For instance, in k-means clustering, the algorithm iteratively assigns data points to clusters by minimizing the sum of squared distances between each point and its assigned cluster’s centroid.

Another important technique in unsupervised learning is dimensionality reduction. In high-dimensional datasets, it can be challenging to visualize and analyze the data effectively. Dimensionality reduction methods aim to capture the essential information of the data while reducing its dimensionality. One widely used technique is principal component analysis (PCA), which identifies orthogonal directions, called principal components, that capture the maximum variance in the data. By projecting the data onto a lower-dimensional space defined by these principal components, it becomes easier to visualize and analyze the data. Other dimensionality reduction methods, such as t-distributed stochastic neighbor embedding (t-SNE), focus on preserving the local structure of the data, making it useful for visualizing clusters or uncovering relationships between data points.

Algorithms and Techniques: There are several algorithms and techniques employed in unsupervised learning. Clustering algorithms group similar data points together based on their proximity in the feature space. Some popular clustering algorithms include k-means clustering, which partitions the data into k clusters by minimizing the sum of squared distances, hierarchical clustering, which creates a hierarchy of clusters based on the distance between data points, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise), which groups data points based on density-connected regions.

Dimensionality reduction techniques aim to reduce the number of features while preserving the most informative aspects of the data. In addition to PCA and t-SNE mentioned earlier, other techniques include independent component analysis (ICA), which aims to separate mixed signals into their underlying independent sources, and autoencoders, which are neural network architectures that learn to compress and reconstruct the input data, effectively capturing its essential features.

Unsupervised learning has numerous real-world applications. Clustering can be used for customer segmentation in marketing, anomaly detection in cybersecurity, or grouping similar documents in natural language processing. Dimensionality reduction techniques find applications in data visualization, feature extraction, and noise reduction. Unsupervised learning allows for exploratory data analysis and provides insights that can guide further analysis or decision-making processes.

In summary, unsupervised learning is a powerful branch of machine learning that discovers patterns, structures, and relationships within unlabeled data. By leveraging mathematical techniques such as clustering and dimensionality reduction, unsupervised learning algorithms enable us to gain valuable insights and make sense of complex datasets.

Reinforcement learning

Reinforcement learning (RL) is a branch of machine learning that focuses on training agents to make sequential decisions in an environment to maximize cumulative rewards. RL involves an agent, an environment, states, actions, rewards, and a learning algorithm. Let’s explore the key components and mathematical foundations of reinforcement learning in detail.

Components of Reinforcement Learning:

  1. Agent: The learner or decision-making entity that interacts with the environment and takes actions.
  2. Environment: The external system or world in which the agent operates.
  3. State (S): A representation of the environment at a particular time, capturing all relevant information for decision-making.
  4. Action (A): The choices or decisions made by the agent in response to a given state.
  5. Reward (R): Feedback from the environment to evaluate the agent’s actions. Rewards can be positive, negative, or neutral signals, guiding the agent’s learning process.
  6. Policy (π): The strategy or behavior the agent employs to select actions in different states.
  7. Value Function (V): The expected cumulative reward an agent expects to receive from a particular state onwards, following a specific policy.
  8. Q-Value Function (Q): The expected cumulative reward an agent expects to receive from taking a specific action in a particular state, following a specific policy.

Mathematical Foundations of Reinforcement Learning:

  1. Markov Decision Process (MDP): MDP is a mathematical framework that models sequential decision-making problems. It assumes the Markov property, stating that the future is conditionally independent of the past given the present state. MDP consists of a set of states, actions, transition probabilities, rewards, and a discount factor (γ) that represents the importance of future rewards.
  2. Policy Evaluation: Policy evaluation aims to estimate the value function (V) or the action-value function (Q) for a given policy. It involves iteratively updating the value estimates based on the Bellman equation, which expresses the value of a state as the sum of the immediate reward and the expected value of the next state.
  3. Policy Improvement: Policy improvement is the process of refining the agent’s policy to make better decisions. It involves selecting actions that lead to higher expected rewards based on the estimated value function.
  4. Exploration and Exploitation: Balancing exploration (trying out different actions to gather information) and exploitation (leveraging the learned knowledge to maximize rewards) is crucial in reinforcement learning. Various strategies, such as ε-greedy, softmax, or Upper Confidence Bound (UCB), can be used to balance exploration and exploitation.

In conclusion, supervised learning, unsupervised learning, and reinforcement learning are the three fundamental paradigms in machine learning. Each paradigm has its unique characteristics, applications, and mathematical foundations. By understanding the principles and mathematics behind these paradigms, we can effectively leverage them to build intelligent systems that learn from data and make informed decisions.

--

--