Getting Familiar with Machine Learning
In this post, I would be writing about what machine learning is and why it is the GOAT.
Machine learning, in the words of Tom M. Mitchell, is “A computer program which is said to learn from experience E concerning some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
I found this definition particularly interesting because it pretty much sums up what machine learning does. In layman's terms, all we are saying is, ML provides us a way to give “experience” to a computer. Using this “experience” the said machine could then predict the associated value/output for the new data. It is a subset of AI in which machines are provided the ability to learn without being explicitly programmed.
All the recommendations that you get on YouTube while shopping online while watching good old Netflix, is made possible with ML.
Why do we need Machine Learning?
The importance of ML in modern times is something we can’t ignore. The amount of unstructured data has become immeasurable and manually processing such data becomes a tedious chore. But machine learning has helped in not only providing structure to data but also in deducing patterns and giving insights on the given data. This in turn helps us to solve complex problems. By building predictive models and using statistical analysis, machine learning can be used to find the hidden patterns and provide us with essential details regarding our data.
Some applications of machine learning are:
1. Facial recognition technology allows users in social media to tag and share pictures with their friends.
2. Recommendation engines, which provides suggestions on videos and television series based on the user’s preferences.
3. Google’s spam filter, where it uses classification algorithms (a type of ML algorithm) and Natural Language Processing to analyze emails in real-time and classify emails as spam or not-spam.
Before we jump into the fun stuff, we need to get familiar with some commonly used terms in ML:
1. Algorithms: programs that are used to learn patterns and draw significant information from them. They improve with experience without human intervention.
2. Model: it is the output of our machine learning algorithm run on our dataset and represents what was learned from the data.
3. Predictor Variable: It is a feature in our input data that helps to predict our output.
4. Response Variable: It is the output variable that is to be predicted using our predictor variables
5. Training dataset: it’s the set of data used to train our algorithm and get insights on the pattern. The pattern deduced through this is then used to predict the output.
6. Test dataset: This dataset is used to measure the accuracy of the prediction of the algorithm.
Types of Machine Learning:
There are mainly three ways in which a machine can learn:
- Supervised learning:
Here the dataset contains “labeled data”. This suggests that a certain output value is being associated with each input data. This allows us to map our input data with output labels and thus determine the pattern established between them. This allows us to accurately predict the output label associated with any new data. There are mainly two types of supervised learning:
Classification: When we can group or classify our dataset into two or more groups based on output labels. The most common example would be a spam filter where our emails are segregated into spam or not-spam categories.
Regression: This is used when our output label values are continuous. Example when we are predicting housing prices. We cannot categorize the housing prices and there is no point to it. Here we are only concerned with predicting the price given the features of our house like, no. of rooms, area, etc.
Some of the applications of supervised learning involve bioinformatics, speech recognition, spam detection.
Examples of supervised learning algorithms: Linear regression, Logistic Regression, K-Nearest Neighbors, Naïve-Bayes classifier.
2. Unsupervised learning:
This type of learning involves input data with no corresponding output labels. Since labels aren’t given, the machine has to deduce the pattern without any guidance. So, unsorted data is basically grouped in terms of similarity in features. There are two types of unsupervised learning:
Clustering: Grouping of input data is based on the similar features they have. An input data is more similar to any other input in a given group/cluster than to any other input outside their cluster.
For example: if we were to identify a banana and an apple but there is no label or guide to help us classify them, we could simply use their features to cluster the given fruits into the aforementioned categories. Like for banana, it should be yellow and has greater length while for apple it should be red and of small length.
Association: We find interdependencies between the input in large datasets. Such rules or dependences could then be used to describe large sets of data.
For example, if a person buys a notebook then they are most likely to buy a pen or other kinds of stationery products as well.
Some of the applications of unsupervised learning involve Credit-card fraud detection, anomaly detection, recommender systems that give suggestions on which products to buy next.
Examples of unsupervised learning algorithms: K-Means Clustering, Apriori Algorithm, Hierarchical clustering, FP-Growth Algorithm
3. Reinforcement learning:
It involves three components:
1. Agent (learner or decision-maker)
2. Environment (everything which agent interacts with)
Here an agent is allowed to learn from the outcomes of its actions. It involves a feedback system, which gives a reward if the agent acts towards our goal state and gives a penalty if our agent moves away from the goal state. This type of learning mostly involves trial and error.
For example: when we first play any video game, we would play it rather clumsily. But with trial and error, we would learn to avoid actions that could cause a penalty.
So, this is it, guys. I hope you learned something from this blog. Thank you very much for making it to the end!