6 Jars perspective of Machine learning

4 min readFeb 10, 2019

Machine learning is a subset of artificial intelligence. Machine learning is a category of algorithms that allows the software application to predict outcomes without explicitly programmed.

Let us try to digest the above flowchart by understanding 6 Jars of Machine Learning (6 Jar approach — Reference Deep Learning course by One Fourth Labs; https://padhai.onefourthlabs.in)

6 Jars of Machine learning are

Data
Tasks
Model
Loss function
Learning
Evaluation

Jar #1 Data

Data is everywhere. Collection of data and its interpretation started with caveman drawing on the cave walls and progressed thru civilization thru handwritten/printed books and records. With the Invention of technologies such as the internet, electronic sensors, Camera/Video recorder, IOT; Humans are collecting data at an exponential rate.

For Example:

Product page visit and shopping history collected by Amazon.
Product review collected by Amazon.
Photo and comments collected by Facebook.

How is data collection done ?

Publicly available data such as https://ai.google, https://data.gov.in, https://registry.opendata.aws e.t.c
Collection of data thru crowdsourcing to data curators such as Amazon mechanical turk (https://www.mturk.com), mysurvey (https://www.mysurvey.com), Data turks (https://dataturks.com), e.t.c.
Collecting our own data thru manual recording, sensors, IOT, e.t.c.

Jar #2 Tasks

Availability of Huge data help us to gain additional insight. Task is gaining additional insight about data. Different type of tasks that can be done with data are

Is it A or B or C? — Classification (Supervised learning)
Is it weird ?— Anomaly detection (Similar to binary classification but part of unsupervised learning)
How much more? — Regression (Supervised learning)
How it’s organized? — Clustering (Unsupervised learning)
What to do next? — Reinforcement learning Algorithm

Note: Reference from Machine learning basics course from Microsoft Azure

For Examples:

Product suggestion based on previous browse and shopping history.
Predict product sales based on customer review.
Predict user emotion based on comments.
Identify the person and tag him in the photo.

Supervised learning Vs Unsupervised learning

Supervised learning task uses labelled data (Data with input/output), where we now the target or value or class to be predicted.

Unsupervised learning task used unlabelled data, where we predict pattern based on data provided.

Jar #3 Model

Model is the pre-defined function, which provides approximate relationship between input and output.

y = f(x) (True relationship, unknown)

y` =f`(x) (Approximate relationship)

Jar #4 Loss Function

Loss function helps to calculate the difference between Approximate relationship and true relationship between input and output.

Different type of loss functions

Mean square error loss

2. Cross entropy loss

3. KL Divergence

Jar #5 Learning Algorithm

Aim of learning algorithm is to minimize the choosen loss function by tuning the parameter of the choosen model.

For example:

Model: y = mx +c

Loss function: Mean square error loss

Learning algorithm will find value for m & c, where mean square error less is minimum

Different type of learning algorithm

Gradient Descent
AdaGrad
RMSProp
Adam
Back propagation

Jar #6 Evaluation

Once a model is trained, we use test data to check the accuracy of prediction. There are various method to evaluate the trained model.

Percentage of accuracy (Used for classification) = Total correct prediction / Total number of prediction
Confusion Matrix & F1 score uses True positive, True Negative, False positive and False Negative for evaluating the trained model

3. Mean absolute error and Mean squared error

Note: Choosing the Model, Choosing the loss function, Choosing and running the Learning algorithm and Evaluation of the trained model is done in iteration until we have best trained model to do our task for provided data.

Conclusion

6 Jars perspective help to understand the bigger picture of machine learning and provide us a framework to understand problem area.

For example:

Problem: user unable to understand sign board.

Data: Data of paired words between language in sign board (English) and native language (Hindi).

Task: To Identify and translate words in sign board from English to Hindi.

Model: Neural Network (RNN / CNN)

Loss Function: Likelihood loss / cross entrophy loss

Learning algorithm: Back propogation

Evaluation: Accuracy or F1 score

6 Jars perspective of Machine learning

Written by venkatachalam ramalingam