Subfield of artificial intelligence:

11 min readJul 31, 2022

Machine Learning

Fig. Types of Machine Learning

Introduction

This article will focus more on semi-supervised learning and supervised machine learning. Semi-supervised, Supervised and Unsupervised machine learning is an important aspect of Data Science.

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data).

Supervised learning is the machine learning task of inferring a function from labelled training data. The training data consists of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).

In unsupervised machine learning, the labels of the data are either not available or not yet known. A machine learning model discovers hidden patterns in the data by identifying groups of samples with specific characteristics in common.

Before starting I just want to mention that the future of planet Earth is Artificial Intelligence / Machine Learning. Waking up in this world full of innovation feels more and more like magic. There are many kinds of implementations and techniques to carry out Artificial Intelligence and Machine Learning to solve real-time problems, out of which Supervised Learning is one of the most used approaches.

First we start with Semi-Supervised Machine Learning -

Semi-Supervised Machine Learning: The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised Learning is that it’s application spectrum is limited.

To counter these disadvantages, the concept of Semi-Supervised Learning was introduced. In this type of learning, the algorithm is trained upon a combination of labeled and unlabelled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabelled data. The basic procedure involved is that first, the programmer will cluster similar data using an unsupervised learning algorithm and then use the existing labeled data to label the rest of the unlabelled data. The typical use cases of such type of algorithm have a common property among them — The acquisition of unlabelled data is relatively cheap while labeling the said data is very expensive.

Intuitively, one may imagine the three types of learning algorithms as Supervised learning where a student is under the supervision of a teacher at both home and school, Unsupervised learning where a student has to figure out a concept himself and Semi-Supervised learning where a teacher teaches a few concepts in class and gives questions as homework which are based on similar concepts.

A Semi-Supervised algorithm assumes the following about the data
Continuity Assumption: The algorithm assumes that the points which are closer to each other are more likely to have the same output label.
Cluster Assumption: The data can be divided into discrete clusters and points in the same cluster are more likely to share an output label.
Manifold Assumption: The data lie approximately on a manifold of much lower dimension than the input space. This assumption allows the use of distances and densities which are defined on a manifold.

Practical applications of Semi-Supervised Learning —
Speech Analysis: Since labeling of audio files is a very intensive task, Semi-Supervised learning is a very natural approach to solve this problem.
Internet Content Classification: Labeling each webpage is an impractical and unfeasible process and thus uses Semi-Supervised learning algorithms. Even the Google search algorithm uses a variant of Semi-Supervised learning to rank the relevance of a webpage for a given query.
Protein Sequence Classification: Since DNA strands are typically very large in size, the rise of Semi-Supervised learning has been imminent in this field.

2. Supervised Machine Learning:

In supervised learning, we start with importing dataset containing training attributes and the target attributes. The Supervised Learning algorithm will learn the relation between training examples and their associated target variables and apply that learned relationship on the new attribute data to predict corresponding target attribute.

Basically your attribute data will be mapped to the target attribute through a model.

Mathematically,

Y = f(X)

The ultimate goal of the Supervised learning algorithm is to predict Y with the maximum accuracy for a given new input X.

Based on the given datasets the machine learning problem is categorized into two types, Classification and Regression.

Classification:

Classification problem are those where target variable is predicted on a discrete scale. In other words we receive a data point and have to predict a value for it. The value we predict can only have a finite number of possible values.

For example: classify emails as spam or not spam, classify a loaner as credit worthy or not credit worth based on predictors such as income, credit score, age, etc.

Fig. Spam or Not-Spam

Some of the most used Classification algorithms are:

Random Forest
Decision Trees
Logistic Regression
Support Vector Machines
K Nearest Neighbors
Naive Bayes Classifier

Classification Algorithms:

Now lets talk about various Classification algorithms in brief.

Random Forest:

Random Forest is a Machine Learning Algorithm Based on Decision Trees. Random Trees lies in one of those Class of ML Algorithms which does ‘ensemble’ classification. By Ensemble we mean(In Random Forest Context), Collective Decisions of Different Decision Trees. In RF(Random Forest), we make a prediction about the class, not simply based on One Decision Trees, but by an (almost) Unanimous Prediction, made by ‘K’ Decision Trees.

Prediction in Random Forest (a collection of ‘K’ Decision Trees) is truly ensemble i.e. For Each Decision Tree, Predict the class of Instance and then return the class which was predicted the most often.

‘K’ Individual Decision Trees are made from given Dataset, by randomly dividing the Dataset and the Feature Subspace by process called as Bootstrap Aggregation(Bagging), which is process of random selection with replacement.

2. Decision Trees:

A decision tree is a diagram depicting flowchart that shows the various outcomes from a series of decisions. In simple terms Decision Tree consist of nested if else condition. It has three main components: a root node, leaf nodes and branches. The root node component is the beginning point of the main tree, and both root and leaf nodes contain questions or various criteria to be answered.

Each branch of the decision tree represents a possible decision, occurrence or reaction to the decision. Decision trees give people a highly effective and easy way to understand the potential options of a decision and its range of possible outcomes.

All decision trees begin with a particular decision. This initial decision is depicted using a square box to the extreme left of the decision tree. Lines are drawn outward from the box, representing each possible option. At the end of each line, you can analyze the results. If the result of an option is a new decision, a box is drawn at the end of the line, and new lines are drawn out of that decision, representing the new options.

3. Logistic Regression:

Logistic regression is generally used where the dependent variable is Binary or Dichotomous. That means the dependent variable can take only two possible values such as “Yes or No”, “Default or No Default”, “Living or Dead”, “Responder or Non Responder”, “Yes or No” etc. Independent factors or variables can be categorical or numerical variables.

Please note that even though logistic (logit) regression is frequently used for binary variables (2 classes), it can be used for categorical dependent variables with more than 2 classes. In this case it’s called Multinomial Logistic Regression.

4. Support Vector Machines:

Support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

5. K Nearest Neighbors Classifier:

The K Nearest Neighbors algorithm is a classification algorithm used in Data Science and Machine Learning. The goal is to classify a new data point/observation into one of multiple existing categories. So, a number of neighbors ‘k’ is selected ( usually k = 5 ), and the k closest data points are identified (either using Euclidean or Manhattan distances) Of the k closest data points (or ‘neighbors’), the category with the highest number of k-close neighbors is the category assigned to the new data point.

Intuitively, this makes sense — a data point belongs in the category it’s most similar to with respect to its features/properties. The most similar data points are the ones that are nearest to that data point, if you visualize it on a graph.

6. Naive Bayes Classifier:

Naive Bayes is a probabilistic classification method based on Bayes’ theorem. Bayes’ theorem gives the relationship between the probabilities of two events and their conditional probabilities. A naive Bayes classifier assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of other features.

The conditional probability of event A occurring given that event B has already occurred is P(A|B).

Regression:

Regression problems are those where you are trying to predict or explain one thing (dependent variable) using what you know about other things (independent variables). That covers anything that can be expressed as numbers, probabilities, or true/false answers. Linear regression is intended more specifically for continuous numbers.

For example: predicting house prices using predictors such as house location, age of house, number of rooms, size, zip code, etc.

Fig. Housing price prediction

Some of the most used Regression algorithms are:

Linear Regression
Polynomial Regression

Regression Algorithms:

Now lets talk about various Regression algorithms in brief.

Linear Regression:

Linear Regression is one of the most fundamental and widely used Machine Learning Algorithms. Linear Regression models the relationship between a dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). The dependent variable is continuous. The independent variable(s) can be continuous or discrete, and the nature of the relationship is linear.

Linear relationships can either be positive or negative. A positive relationship between two variables basically means that an increase in the value of one variable also implies an increase in the value of the other variable. A negative relationship between two variables means that an increase in the value of one variable implies a decrease in the value of the other variable.

2. Polynomial Regression:

Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and independent variable(x) as nth degree polynomial. It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.

Polynomial Regression makes use of a linear regression model to fit the complicated and non-linear functions and datasets. Hence, In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and then modeled using a linear model.

Benefits and limitations

Supervised learning models have some advantages over the unsupervised approach, but they also have limitations. Supervised learning systems are more likely to make judgments that humans can relate to, for example, because humans have provided the basis for decisions.

However, in the case of a retrieval-based method, supervised learning systems have trouble dealing with new information. If a system with categories for cars and trucks is presented with a bicycle, for example, it would have to be incorrectly lumped in one category or the other. If the AI system was generative (that is, unsupervised), however, it may not know what the bicycle is, but it would be able to recognize it as belonging to a separate category.

Supervised learning also typically requires large amounts of correctly labeled data to reach acceptable performance levels, and such data may not always be available. Unsupervised learning does not suffer from this problem and can work with unlabeled data as well.

Difference between supervised and unsupervised learning

A key difference between supervised and unsupervised learning algorithms is that supervised learning algorithms require labels or categories for the training dataset, as opposed to unsupervised learning algorithms where the goal is to cluster data points into discrete groups.

Unsupervised machine:

Unsupervised machine learning is the second type of machine learning algorithm after supervised learning in machine learning that allows addressing problems or situations with little idea or sometimes even no idea about how the results will look like (Carter, Dubchak, & Holbrook, 2001; Ghahramani, 2003). Under unsupervised learning, the deriving of the feedbacks that are based on the results of predictions made is not present. Fig. 26.4 shows the working of supervised machine learning.

Figure: Flow diagram of the unsupervised machine learning algorithm.

To understand unsupervised learning more simply, let us look at the same example of a father and a son that was considered in the above section, that is, of supervised machine learning. Taking the case of unsupervised learning into the consideration, when the son urges that he wants to learn how to drive the car, he is given the car without any training and with having no idea of what the actual result will be, whereas in supervised learning, proper training was given, and the results were already known. This is how supervised learning and unsupervised learning differ from each other. Below is a problem discussed that will give a better view for the machine learning algorithms when taken into the consideration together, whether which algorithm should be used as one of the most difficult tasks in machine learning is choosing the right algorithm at the right time (Fig. 26.5).

That’s all for the Supervised Learning! Keep your eye out for more blogs coming soon that will go into more depth on specific topics.

If you enjoy my work and want to keep up to date with the latest publications or would like to get in touch, I can be found on LinkedIn at @sarthak17 or on Medium at Sarthakkumargupta— Thanks!

………………………………………………………………………………………