All about Supervised Learning

8 min readFeb 26, 2022

Humant Sattabhayya 27–02–2022

Introduction

This article will focus more on supervised learning. Supervised learning is an important aspect of Data Science. It is the machine learning task of inferring a function from labelled training data. The training data consists of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).

Before starting I just want to mention that the future of planet Earth is Artificial Intelligence / Machine Learning. Waking up in this world full of innovation feels more and more like magic. There are many kinds of implementations and techniques to carry out Artificial Intelligence and Machine Learning to solve real-time problems, out of which Supervised Learning is one of the most used approaches.

Introduction to Machine Learning — Fig. Types of Machine Learning

In supervised learning, we start with importing dataset containing training attributes and the target attributes. The Supervised Learning algorithm will learn the relation between training examples and their associated target variables and apply that learned relationship on the new attribute data to predict corresponding target attribute.

Basically your attribute data will be mapped to the target attribute through a model.

Mathematically,

Y = f(X)

The ultimate goal of the Supervised learning algorithm is to predict Y with the maximum accuracy for a given new input X.

Based on the given datasets the machine learning problem is categorized into two types, Classification and Regression.

Classification:

Classification problem are those where target variable is predicted on a discrete scale. In other words we receive a data point and have to predict a value for it. The value we predict can only have a finite number of possible values.

For example: classify emails as spam or not spam, classify a loaner as credit worthy or not credit worth based on predictors such as income, credit score, age, etc.

Some of the most used Classification algorithms are:

Random Forest
Decision Trees
Logistic Regression
Support Vector Machines
K Nearest Neighbors
Naive Bayes Classifier

Classification Algorithms:

Now lets talk about various Classification algorithms in brief.

Random Forest:

Random Forest is a Machine Learning Algorithm Based on Decision Trees. Random Trees lies in one of those Class of ML Algorithms which does ‘ensemble’ classification. By Ensemble we mean(In Random Forest Context), Collective Decisions of Different Decision Trees. In RF(Random Forest), we make a prediction about the class, not simply based on One Decision Trees, but by an (almost) Unanimous Prediction, made by ‘K’ Decision Trees.

Prediction in Random Forest (a collection of ‘K’ Decision Trees) is truly ensemble i.e. For Each Decision Tree, Predict the class of Instance and then return the class which was predicted the most often.

‘K’ Individual Decision Trees are made from given Dataset, by randomly dividing the Dataset and the Feature Subspace by process called as Bootstrap Aggregation(Bagging), which is process of random selection with replacement.

2. Decision Trees:

A decision tree is a diagram depicting flowchart that shows the various outcomes from a series of decisions. In simple terms Decision Tree consist of nested if else condition. It has three main components: a root node, leaf nodes and branches. The root node component is the beginning point of the main tree, and both root and leaf nodes contain questions or various criteria to be answered.

Each branch of the decision tree represents a possible decision, occurrence or reaction to the decision. Decision trees give people a highly effective and easy way to understand the potential options of a decision and its range of possible outcomes.

All decision trees begin with a particular decision. This initial decision is depicted using a square box to the extreme left of the decision tree. Lines are drawn outward from the box, representing each possible option. At the end of each line, you can analyze the results. If the result of an option is a new decision, a box is drawn at the end of the line, and new lines are drawn out of that decision, representing the new options.

3. Logistic Regression:

Logistic regression is generally used where the dependent variable is Binary or Dichotomous. That means the dependent variable can take only two possible values such as “Yes or No”, “Default or No Default”, “Living or Dead”, “Responder or Non Responder”, “Yes or No” etc. Independent factors or variables can be categorical or numerical variables.

Please note that even though logistic (logit) regression is frequently used for binary variables (2 classes), it can be used for categorical dependent variables with more than 2 classes. In this case it’s called Multinomial Logistic Regression.

4. Support Vector Machines:

Support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

5. K Nearest Neighbors Classifier:

The K Nearest Neighbors algorithm is a classification algorithm used in Data Science and Machine Learning. The goal is to classify a new data point/observation into one of multiple existing categories. So, a number of neighbors ‘k’ is selected ( usually k = 5 ), and the k closest data points are identified (either using Euclidean or Manhattan distances) Of the k closest data points (or ‘neighbors’), the category with the highest number of k-close neighbors is the category assigned to the new data point.

Intuitively, this makes sense — a data point belongs in the category it’s most similar to with respect to its features/properties. The most similar data points are the ones that are nearest to that data point, if you visualize it on a graph.

6. Naive Bayes Classifier:

Naive Bayes is a probabilistic classification method based on Bayes’ theorem. Bayes’ theorem gives the relationship between the probabilities of two events and their conditional probabilities. A naive Bayes classifier assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of other features.

The conditional probability of event A occurring given that event B has already occurred is P(A|B).

Regression:

Regression problems are those where you are trying to predict or explain one thing (dependent variable) using what you know about other things (independent variables). That covers anything that can be expressed as numbers, probabilities, or true/false answers. Linear regression is intended more specifically for continuous numbers.

For example: predicting house prices using predictors such as house location, age of house, number of rooms, size, zip code, etc.

Some of the most used Regression algorithms are:

Linear Regression
Polynomial Regression

Regression Algorithms:

Now lets talk about various Regression algorithms in brief.

Linear Regression:

Linear Regression is one of the most fundamental and widely used Machine Learning Algorithms. Linear Regression models the relationship between a dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). The dependent variable is continuous. The independent variable(s) can be continuous or discrete, and the nature of the relationship is linear.

Linear relationships can either be positive or negative. A positive relationship between two variables basically means that an increase in the value of one variable also implies an increase in the value of the other variable. A negative relationship between two variables means that an increase in the value of one variable implies a decrease in the value of the other variable.

2. Polynomial Regression:

Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and independent variable(x) as nth degree polynomial. It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.

Polynomial Regression makes use of a linear regression model to fit the complicated and non-linear functions and datasets. Hence, In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and then modeled using a linear model.

Benefits and limitations

Supervised learning models have some advantages over the unsupervised approach, but they also have limitations. Supervised learning systems are more likely to make judgments that humans can relate to, for example, because humans have provided the basis for decisions.

However, in the case of a retrieval-based method, supervised learning systems have trouble dealing with new information. If a system with categories for cars and trucks is presented with a bicycle, for example, it would have to be incorrectly lumped in one category or the other. If the AI system was generative (that is, unsupervised), however, it may not know what the bicycle is, but it would be able to recognize it as belonging to a separate category.

Supervised learning also typically requires large amounts of correctly labeled data to reach acceptable performance levels, and such data may not always be available. Unsupervised learning does not suffer from this problem and can work with unlabeled data as well.

Difference between supervised and unsupervised learning

A key difference between supervised and unsupervised learning algorithms is that supervised learning algorithms require labels or categories for the training dataset, as opposed to unsupervised learning algorithms where the goal is to cluster data points into discrete groups.

That’s all for the Supervised Learning! Keep your eye out for more blogs coming soon that will go into more depth on specific topics.

If you enjoy my work and want to keep up to date with the latest publications or would like to get in touch, I can be found on LinkedIn at @humantsattabhayya or on Medium at Humantsattabhayya— Thanks!