Elements of Machine Learning
(Basic framework for designing, developing and evaluating any machine learning project/application)
A thousand feet overview of machine learning :
Human collects the data and curate it to perform some specific task. In order to complete the specific task , we apply some approximated model(^y=^f(x)) and train the model by updating various hyperparameter(epochs /learning rate/ bias-variance tradeoff/activation function/initialization method/regularisation etc) and hence evaluate the efficiency of the model by calculating the no. of correct prediction made.
To understand these elements we first need to understand supervised and unsupervised learning and the difference that stands between them.
Supervised Learning: Given a labeled training dataset, we train it on some model of our own by inferring the relationship between inputs(x’s) and output(y’s) and try to find the best-updated parameter and test the model on unseen data points.
Unsupervised Learning: Given an unlabeled dataset, we try to either group the data points based on some attributes(demographics/behavior/like/dislike) or generate a new sequence of inputs based on the available inputs.
“Supervised Learning(Classification/Regression) has created 99% of all economic value in Artificial Intelligence” — Andrew NG (CEO, deeplearning.ai)
“If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL).” — Yann LeCun
There are six elements of machine learning :
1. Data
2. Defining a Task
3. Applying Model
4. Calculating Loss
5. Learning Algorithm
6. Evaluation
First Element:
- Data(The fossil fuel of machine learning):
A typical dataset required to perform any ML prediction is of high dimensions meaning it consists of millions of rows/data points/observations with typically thousands or maybe millions of columns/parameters/features. A dataset could be presented with inputs as well as its corresponding outputs which is ideal for performing any supervised learning task by learning the relationship between i/p and o/p and if the dataset doesn’t contain any corresponding output w.r.t to inputs then we can only perform unsupervised learning task.
A data could be structured(represented in the tabular form e.g. — sales data/file records) and unstructured(incoming feed on social media websites)
Source : https://upload.wikimedia.org/wikipedia/commons/6/6d/Data_types_-_en.svg
Data Curation (Pre-processing the data) :
The data that has to be fed to the model for training should be in machine-readable form i.e. it should be encoded as a number.
E.g. -
- Certain text data like reviews should be represented in the numerical format using one-hot encoding.
- Image data can be represented in RGB format.
- Video (a collection of frames) can be represented in numerical format.
- Speech data could be represented in the numerical format using variation in amplitude.
From where we can get the data?
Some of the open data repositories which serve as the crowdsourcing platform are :
- Google.ai (available since 2017)
- data.gov.in(since 2012)
- wiki data(central storage for data for all Wikimedia projects)
We can either use real datasets collected from govt. surveys, account books or real-time data generation in social media/e-commerce sites to perform analysis and prescribe further action/remedies or we can create our dummy data using simulation and image editing/processing tools.
Second Element:
2. Task( Setting an objective of the ML project with the curated dataset):
Source :https://www.guru99.com/images/tensorflow/082918_1102_WhatisMachi5.png
Based on the procured/curated dataset we can define our task accordingly. If we have labeled training dataset i.e. it contains input(x’s) and its corresponding labels(y’s) we can easily perform supervised learning(classification/regression) and if we don’t have labeled training dataset doesn’t contains corresponding labels(y’s) we can only perform unsupervised learning(clustering/generation).
- Classification(binary/multi): The data point is tried to be mapped with some category/label. The label could be binary such as (like/dislike) or multi such as classifying the alphabets of English language(A-Z).
- Regression: In this task, the model tries to predict the output of an unseen data point in some real value such as stock price or in terms of the probability that the given image contains a signboard.
- Clustering: Given the dataset, we try to group data points on the basis of similarities in some attributes(demographic/popularity/relevance/reviews)
- Generation: In this task given the available data points(x’s) the model try to generate new inputs which are in some pattern with previous data points(x’s) such as the generation of words while typing in some alphabet in an application of keyboard.
Third Element:
3. Model( Mathematical formulation of a task ):
y=f(x)[unknown relation]
Our approximation function : ^y=^f(x)
An ML/DL model is typically the approximation of an ml engineer for finding the relationship between data points(x’s) and its corresponding labels(y’s) denoted by y=f(x). The model could be simple or complex depending on the no. of parameters it has.
Some of the complex models used for training are :
- Sigmoid Neuron (1/(1+e^-x))
- Feed Forward Neural Network (FFN)
- Convolutional Neural Network (CNN)
- Recurrent Neural Network (RNN)
- Long Short Term Memory (LSTM)
Fourth Element:
4. Loss Function(How could we say which model best fit for estimating the correct relationship for the task ? ):
Loss function calculates the difference between true output (y) and the approximated value (^f(x)). It is represented with L.
If L=0 then the estimate of our model is exactly accurate.
Different loss functions are :
- Square Error Loss
- Cross-Entropy Loss
- KL divergence
- Hinge Loss
- Huber Loss etc
The lower the loss better the model is in terms of accurate prediction.
Fifth Element:
5. Learning Algorithm(How do we estimate different parameters ?):
Under this element, the success of machine learning lies.
Parameter estimation in machine learning is a kind of search operation. We can compute the parameters through learning algorithm and it becomes an optimization problem where we try to optimize the parameters by minimizing the loss. Hence, the learning algorithm and loss function go hand in hand.
Some of the popular learning algorithm/optimization solvers are :
- Gradient Descent
- Adagrad
- RMSProp
- ADAM
- Backpropagation
Backpropagation
Source : https://miro.medium.com/max/2000/1*fnU_3MGmFp0LBIzRPx42-w.png
Sixth Element:
6. Evaluation(How do we compute the accuracy of an ML/DL model ?):
Accuracy = (No. of correct prediction )/(Total no. of prediction)
Calculating accuracy indicates how efficient the model is and is more interpretable for the end-user than the loss function.
Top-K Accuracy: Out of the top-k prediction made by the prediction if we can find the correct output among these then we can accept the model.
Top-K Accuracy = (No. of correct prediction made in top-k) / (Total no. of prediction)
The evaluation has to be done on the test data (data points which haven’t been seen by the machine).
The standard evaluation metric in the case of object detection where some action is required to be taken is precision and recall.
Precision = (No. of correct action) / (Total no. of action taken )
Recall = (Actual no. of correct action) / (Total No. of times correct action to be taken)
Source : https://miro.medium.com/max/848/1*7SgzmX05T81Ojaor9s5HWQ.png
Hence, we can work on any machine learning project by understanding the above six elements in detail and design our process accordingly.
