3 MOST IMPORTANT MACHINE LEARNING APPROACHES

Published in

Getting started with Data Science

9 min readOct 18, 2019

Its been a while since we have posted a blog, and here we go on a brief overview on ML Algorithms.
Note : It is just an overview may be a trailer to get you all started ;)
However, we will go into it’s statistical depth in the upcoming blogs, so stay in touch ..!✌
For now, grab yourself some coffee & let’s get started! 😀

Machine learning is purely based on Inductive Learning- that is learning to improve based on experience. (umm… just like we learn from our mistakes)

Machine Learning is the domain of computer science(and the sub-part of Artificial Intelligence) that uses statistical analysis/techniques to give computers the ability to think according to the previous data provided to it.

As the data is continuously growing it becomes quiet difficult for Humans to understand entire data with good accuracy, precision & speed. This is where machine learning comes into picture. Various Algorithms are to be applied in order to train the data to a particular machine & depending on the type of data set we choose a suitable algorithm.

Let us have a look to the three main approaches towards applying ML Algorithms :

***whoaa.! Overwhelmed ?? No worries- let’s make it fun; We’ve got your back!***

Let’s get started with Supervised Learning

1. SUPERVISED LEARNING:

It learns about a function that has input- output pairs.
By this we mean to say, that the function maps an input to an output. It refers to a function from a training data consisting of an input object also called as a ‘Vector’ and a desired output which is known as ‘supervisory signal’. This
generated function can be used in future for mapping new examples.
In supervised Machine Learning we deal with the labelled inputs as shown below:
Working of Supervised Machine Learning(SL):

**After training, given any image as input this algorithms helps to distinguish whether the image is apple, car, cat or unknown.**

→ SL works with only labelled data (i.e It only accepts labelled data as its input and then the machine is trained with this ‘labelled data’ and a known output is given.)
→ This ‘Output’ is now given as a direct feedback as data to be trained for future use. Thereby, improving the model’s efficiency with experience.

→ Learning to Map:

Mapping can be done in following ways:

a) Regression:-

For a given set of inputs we get a continuous value as an output.A problem falls under the category of regression once the output variable is a real or “continuous value” . Problems such as:
→ Predicting marks of the student
→ Predicting how long will the product last ?
→ Predicting how long will it rain ? etc,. have continuous output. Therefore, they fall under this category

Few Machine Learning Algorithms are:

We need to explore the *relationship between dependent (Y) and independent variable (X).*

i) Simple Linear Regression :It establishes the relationship between two variables using a straight line. Linear regression attempts to draw a line that comes closest to the data by finding the slope and intercept that define the line and minimize regression errors. A straight line approximates the relationship between the dependent variable and the independent variable.

Hypothesis Function: Y=m*X+c
where Y is the output
X-input
m-slope of the line (coefficient of x)
c- Intercept

Real Time Example: Consider a case, Years of Experience versus Salary :
We know that as the Years of experience increases, the salary of the person also increases. Here the Salary (output/Y) is dependent on the explanatory variable i.e years of experience (input/independent quantity/X). Therefore, the more the no. of data points coincide on the linear line, the less are the regression errors (standard error of estimate).

ii) Multivariate Regression:-
→ This type of Analysis is used when more than one Independent Variables i.e “predictors”and more than one Dependent variables i.e “Responses” are linearly related to each other.
→ This method is widely used to predict the behaviour of responses corresponding to any changes in the predictors once the desired relation is established.

Real Time Example : In a Super Market can the owner logically maintain his stock depending on few conditions ( say for example, sunny, tornado, increase in gas prices)???

iii) Multiple Regression : It is a statistical method which uses several variables in order to predict the outcome unlike Simple Linear regression which only uses one variable for output prediction.The independent variables are not too highly correlated with each other.It extends to several explanatory variables.

The aim of Multiple Regression is to look at the multiple dependent variables (Y) among the set of attributes in a data set which are related to the independent variables (X).

Real Time Example : The probability of a person getting hired and getting a good salary (dependent parameters) is dependent upon various other independent parameters say Skill set, communication skills, experience, educational background etc,.

b) Classification :

For a given set of inputs we get a categorical output.In other words we get a discrete output.A problem is said to fall under classification category if the output of the problem belongs to a particular category. For instance, any output which results in “True”/“False” or “Yes” /“No” or “Male”/ “Female” etc., comes under classification.
A classification model aims to draw some conclusion out of the observations made.
Few real time problems that can be solved using this model:
→ Predicting whether the person is diagnosed with a disease or no ?
→ Predicting the gender of a person based on handwriting ?
→ Predicting whether it is going to rain or no ?

The algorithms that fall under Classification :

i) K- Nearest Neighbors : K-Nearest Neighbors is the simplest among all the algorithms which is also referred to as “lazy” algorithm. It is called lazy not because of its simplicity but because it does not discover a distinct function but instead memories the entire training data set. It simply classifies the data points into groups/classes based on distance.
Real Time Example: Predicting the species of the egg :-

**UNDERSTANDING THAT K NEAREST NEIGHBORS IDENTIFIES YOU BY KNOWING YOUR NEIGHBORS !**

ii) Support Vector Machine(SVM):

SVM classify the images which can be used for better search accuracy of image classification. In the above case SVM is categorizing smile and rude face emoji in plane.

SVM is a supervised machine learning technique which can be used for both classification and regression but usually used for classification. In this algorithm we plot a data point in an N- dimensional space where N=no. of features. We use hyper-planes to classify the data points into different sets. The best hyper-plane which divides entire set of data accurately is chosen.

iii) Decision Trees : Decision trees explain us how the data is split continuously in accordance with a certain parameter. The two main parts of trees are the Node and the leaves, wherein the node indicates the input data and the leaves indicate the final outcome or the final decision and the decision nodes are where the data is split.

**SOLUTION: COMPUTER SCIENCE, AS 2/3 DECISION TREES GIVES CS AS THE SOLUTION (“MORE THE NO. OF TREES, MORE IS THE ACCURACY”)**

iv) Random Forest :This algorithm creates a forest with number of randomly chosen multiple decision trees. The more trees in the forest the more accurate is the result. This is considered to be the most efficient model among all .
It can be used for both classification and regression tasks but, only gives best performance when used for classification .It handles missing values and maintains accuracy for NaN values .It can handle large data sets with high accuracy.

v) Logistic Regression :
Logistic Regression is similar to Linear Regression except it predicts whether something is True or False, instead of predicting something that is continuous.
Logistic Regression fits an “S-shaped logistic function” called the “Sigmoid Function”. This S-shaped curve lie between the range of 0–1 which means it tells us the probability depending upon dependent (Y) and independent variable (X).

Real Time Example:
→ Predicting the revenues of a certain product
→ Match scoring
→ Predicting the probability of earthquake on a certain day.

Logistic regression is a traditional statistics technique that is also very popular as a machine learning tool. Logistic Regression’s has the potential to provide probabilities and classify the new data on any kind of inputs.

2. UNSUPERVISED LEARNING

In Unsupervised Learning our main goal is not to produce the output but to discover patterns.

→ In this case the machine did not learn anything before ( i.e NO Training )
→ It has no knowledge about the output class
→ Data is “unlabelled” or “unknown value ”
→ No Supervisor required and therefore, they are “self-guided Algorithms”

🏆 — This model is set to discover patterns from the unlabelled data by itself . It discovers all types of unknown patterns from the given data.

This can be done in two methods :
a) Clustering :

**Clustering the data into two different classes from a mixture (GIF Source-** **Sud Recycling** )

Here the data points present in the data set are divided into subgroups with similarity between data points and at the same time dissimilar to the other subgroups. In short, it is a collection of similar and different objects in a data set.
Real Time Example : A hospital needs to expand its network by constructing more no. of emergency wards. In this case the clustering Algorithm helps in dividing the areas of the state into major and minor accident prone areas and thereby, more no. of emergency wards can be constructed around major accident prone areas when compared to those of minor injury prone areas.

b) Association : It talks about frequency occurrence.
Association learning deals with discovering fascinating relationships hidden in bulky data sets. This relationships establishes something called as sets of frequency items.

Let us now look at a Real time Application of it :
→ Many Super markets these days use ML Association Algorithms to predict a transaction. It allows the shop owner to identify relationships between the items that people buy together frequently.
→If a person buys Milk then he is likely to buy bread/apple/jam or any kind of breakfast items.

This analysis shows, In a transaction if an event is occurred then it finds out the frequency/occurrence of the another event which might be certain to occur after event 1 (OR) How frequently an item set occurs in a data set .

3. REINFORCEMENT LEARNING

It learns to control the behavior of system, by using gathered observations from interaction between a Local Agent and the Environment in order to perform some action which results in the change in state which in turn would maximize the reward or minimize the risk.
Here, the agent learns continuously from the environment in a recursive manner. The Feedback given to the agent in the form of reward is used as a learning experience to algorithm. This is known as reinforcement signalling.

MAZE GAME USING REINFORCEMENT LEARNING — GIF SOURCE : i.imgur.com

List of Common Algorithms :
* Q- Learning
* SARSA
* DQN
* DDPG

Real Time Example :
i) Game Playing
ii) AI based Games
iii) Robot Management
iv)Maze Game
v)Autonomous Helicopter flight

It is applied in such a way that the system encounters minimum cost. It is also known as Reward Winning Algorithm as the Learning Agent is enforced to learn from the reward given to it from the Environment. The status of each state is constantly updated.

😃 Well, then that’s it for this blog.
Hope you liked it 👍
Any queries please feel free to ask Renish Sundrani , Kiran Lakhani
In the upcoming blogs we will explore each of these Algorithms and see it’s working in detail along with some coding, so stay connected !✌

You can also connect us via linkedIn-Renish Sundrani & Kiran Lakhani
The Link to our previous blogs :
Installation of Jupyter Notebook
Data Cleaning and removing Outliers