Machine Learning: A Deja Vu?

Neenad Sahasrabuddhe
GDSC GHRCE
Published in
7 min readAug 7, 2020

You would have noticed that whenever you search on Google or watch a show on Netflix, you automatically get recommendations similar to your past search. That’s where Machine Learning comes in!

Machine learning learns from your past search results to recommend similar search results. Machine learning (ML) is the study of computer algorithms that improve automatically through experience. So, let’s start from scratch to know what machine learning is and how it works.

The term Machine learning was coined in 1959 by Arthur Samuel, a pioneer in the field of computer gaming and artificial intelligence. Soon, A widely quoted, more formal definition of the algorithms studied in the machine learning field was given:

A computer program is said to learn from experience E for some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

As of present times, many sources continue to assert that machine learning remains a sub-field of Artificial Intelligence. Yet some practitioners argue that machine learning and AI are separate.

APPROACHES

There are 3 main types of Machine Learning approaches, depending on the nature of the “signal” or “feedback” available to the learning system; they are Supervised Learning, Unsupervised Learning, and Reinforcement Learning. There are many more types of machine learning called Semi-supervised Learning, Transfer Learning, etc. But in this blog, I am only going to talk about 3 pillars of machine learning.

Before going on to theoretical definitions, I will give you three cases and tell me if they are relatable!

CASE I:- In your kindergarten, the teacher teaches you alphabets, numbers, shows some images of fruits, vegetables, etc. After practicing and revising a few times, you are able to recognize the fruits or vegetable at your home. Have you ever imagined how you classified them? Or how do you still remember the alphabets and numbers? The answer to the questions is -by practicing; by training your brain.

CASE II:- In your childhood, you would have saved some money in your piggy bank as your savings. When you have a curiosity to find how much you have saved, you start to count money; but wait how did you count? There are many different notes of 10/-, 20/-, 50/-, 100/- or coins of 1/-, 2/-, 5/-, 10/-. You would have found some similarities amongst them. Am I correct?

CASE III:- While training a dog, when we show him a stick and throw it. There are two possibilities; first, the dog would go and pick up the stick and come to you or second, he didn’t. If he does as you said you reward him else you scold him, right?

A] Supervised learning is the method of teaching a model by feeding it input data as well as correct output data. This input-output pair is usually referred to as “labelled data”. In other words, it is also called as input-output mapping.

The goal of supervised learning is to build an artificial system that can learn to map inputs and outputs and can predict the outputs of the system given new inputs. In case I, the alphabets, numbers, and images of fruits and vegetables are ‘labelled data’. This is similar to a teacher-student scenario; that’s why supervised learning is also called as teacher-student learning.

Applications: With the assistance of supervised learning we can build a recommendation system, predict housing prices, predict cancer is malignant or benign, and spam detection. A self-driving car is one of the examples of supervised learning.

Algorithms: There are two main types under which these algorithms are categorized.

1. Regression:

Linear Regression- The algorithmic program assumes that there is a linear relationship between the two variables, Input (X) and Output (Y), of the data it’s learned from. The Output variable is called the Dependent Variable and the Input variable is called the Independent Variable. When unseen data is passed to the algorithm, it uses the function, calculates, and maps the input to a continuous value for the output.

2. Classification:

Random Forest- Random Forest is a classifier that contains several decision trees on numerous subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset. Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final ultimate output.

B] Unsupervised learning is used to group cases based on similar attributes, or naturally occurring trends, patterns, or relationships in the data. It looks for undetected patterns, similarities in a dataset with no preexisting labels. It can be thought of as a child who learns by independently finding similarities from the given input.

The goal of unsupervised learning is to build an artificial system that can learn independently with no or minimum human supervision and can predict the output of the system given new inputs. In case II there are two groups notes and coins. These are further grouped having the same value on notes and coins. These groups were not ‘labelled data’ but still able to find some relationship amongst them.

Applications: The main applications of unsupervised learning include clustering, visualization, dimensionality reduction, finding association rules, and anomaly detection.

Algorithms: There are two main types under which these algorithms are categorized.

1. Clustering:

K-Means Clustering- The algorithm will categorize the items into k groups of similarity. To calculate similarity, Euclidean distance is used as a measurement. Then the algorithm creates clusters of different data points which are as homogeneous as possible by calculating the centroid of the cluster and making sure that the distance between this centroid and the new data point is as less as possible. The smallest distance between the data point and the centroid determines which cluster it will belong to while making sure the clusters do not overlap with each other.

2. Association:

FP-Growth- The Frequency Pattern (FP) algorithm finds the count of the pattern that has been repeating and then adds that to a table and then finds the most plausible item and sets that as the root of the tree. Other data points are then added into the tree and the support is calculated. If that particular branch fails to meet the threshold of the support, it is trimmed.

C] Reinforcement learning employs trial and error to come up with a solution to the problem. To get the model to do what the programmer wants, the model gets either rewards or penalties for the actions it performs.

In case III the situation for the dog was like either he will get rewards or penalties for his actions. There was neither labelled data nor the answer. By leveraging the ability to search and trial, reinforcement learning is currently the most effective way to hint the machine’s creativity.

Applications: RL requires a lot of data, therefore it is most applicable in domains where simulated data is readily available like gameplay, robotics. AlphaGo Zero is the first computer program to defeat a world champion in the ancient Chinese game of Go and a perfect example of reinforcement learning.

Algorithms: There are 2 approaches to implement a Reinforcement Learning algorithm.

1. Model Free:

In Model-Free RL, the agent does not have access to a model of the environment. By environment I mean a function which predicts state transition and rewards. As of the time of writing, model-free methods are more popular and have been researched extensively.

2. Model-Based:

In Model-Based RL, the agent has access to a model of the environment. Main advantage is that this allows the agent to plan ahead by thinking ahead. Agents distill the results from planning ahead into a learned policy. A famous example of Model-Based RL is AlphaZero.

Conclusion

I have covered much of the basic knowledge underlying in the field of Machine Learning here, but of course, I have only scratched the surface. Keep in mind that to apply the knowledge contained in this introduction to real-life machine learning examples, a much deeper understanding of the topics discussed herein is necessary. Soon, the demand for Machine Learning engineers is going to grow enormously, offering incredible chances and I hope you will consider getting in on the action to be a part of something big.

--

--

Neenad Sahasrabuddhe
GDSC GHRCE

Developer Student Club (Core Team Member)| Data Science| Machine Learning | Cloud | He/Him