Machine Learning Algorithms

Jakki Seshapanpu

Published in

Analytics Vidhya

5 min readAug 5, 2020

- A brief overview

What is Machine Learning?

Arthur Samuel (1959): “Field of study that gives computers the ability to learn without being explicitly programmed”.

Tom Mitchel (1997): “A computer program is said to learn if its performance at a task T, as measured by a performance P, improves with experience E”.

What type of machine learning algorithms to use?

Selecting a right machine-learning algorithm depends on several factors, including the data size, quality and nature of data. Choosing the right algorithm is both a combination of business need, specification, experimentation and time available. Here we will explore different machine learning algorithms.

There are four types of Machine Learning algorithms, they are:

Supervised Learning
Semi-supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning:

In supervised learning, we provide a known dataset that includes inputs and desired outputs. The machine finds a way to determine the outputs at the given set of inputs.

The types of Supervised Learning Algorithms are:

A. Classification: This machine learning algorithm will draw a conclusion from the observed values and determine to which category the new observation belongs.

Here are the different classification algorithms:

1.Logistic Regression: Logistic Regression is used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes (0 or 1).

Type of Logistic regression:
a. Binary or Binomial: The dependent variable will have only two possible type (0 or 1).
b. Multinomial: Dependent variable can have three or more possible unordered types or the types having no quantitative significance. For example, “Type A” or “Type B” or “Type C”.
c. Ordinal: Here the dependent variable can have three or more possible ordered types or the types having a quantitative significance. For example T-shirt sizes, “small”, “medium”, “large”, “extra-large” etc.

2. Naive Bayes Classifier Algorithm: The Naive Bayes classier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a category, based on a given set of features, using the probability.

3. Support Vector Machine: SVM essentially filter data into categories, which is achieved by providing the training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.

4. Decision Trees: Decision Trees can be used to solve both regression and classification problems. It is flow chart like tree structure that use a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable and each branch is the outcome of that test.

5. Random Forests: It is an ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks.

6. Nearest Neighbors: The K Nearest neighbor algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.

B. Regression: Regression Analysis consists of a set of machine learning methods that allow us to predict a continuous outcome of variable based on the value of the one or multiple predictor variables.

C. Forecasting: Here we make future predictions based on the past and present data.

Semi-Supervised Learning:

Semi-supervised learning is an approach to machine learning that combines a small of amount of labeled data with a large amount of unlabeled data during the training. The machine will understand and develops the algorithm from the given labeled data and try to predict the labels for the new data.

Unsupervised learning:

Unsupervised machine learning algorithms infer patterns from the dataset without reference to known or labeled outcomes. Unlike supervised learning, unsupervised learning can’t be applied to a regression or a classification problem because we have no information about the output data. So the unsupervised learning used to determine the underlying pattern of the data.

The types of unsupervised learning are:

Clustering: Clustering is the assignment of a set of observations into subsets so that observations in the same cluster are similar type.
K Means Clustering Algorithm: It is used to categories unlabeled data, i.e. data without defined categories or groups. The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works iteratively to assign each data point to one of K groups based on the features provided.

3. Dimension reduction: Dimension reduction reduces the number of variables being considered to find the required information.

Reinforcement Learning:

Reinforcement learning is a type of dynamic programming that trains algorithms using a system of reward and punishment. A reinforcement-learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. Therefore, it learns from experiences and begins to adapt its methodology in response to the situation to achieve the best potential outcome.

Artificial neural networks (ANNs): These are computing systems inspired by the biological neural networks of the human beings. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like synapses in a biological brain, can transmit a signal to other neurons. ANNs also learn by example and through experience, and they are extremely useful for modelling non-linear relationships in high dimensional data or where the relationship among the input variables is difficult to understand.