My Machine Learning Scratchpad, Day 2

ML is not magic, its math !

Yesterday, learned about what is ML, types of algorithms and its applications. Today I am going to get started with basics of algorithm development and first supervised learning algorithm.

Building Supervised Learning Algorithm:
“Given the ‘right answer’ for example in the data, predict real-valued output”
2 main concepts: 
 i. Building Model/Hypothesis
ii. Minimizing the Cost Function
Notations:
m: Number of training examples
x: input vars/features
y: output vars/target
single training ex: (x,y)
Basic Algorithm Flow chart:
 
i. Training Set >> Learning Algorithm >> hypothesis/model
ii. input >> hypothesis/model >> output
 eg. size of house >> hypothesis/model >> price of house

  1. Linear Regression Algorithm
    i. Model / Hypothesis

    Equation: hθ(x) = θ1 + x.θ0
    This is known as Linear or Univariate regression since there is only one variable x.
    This is the equation of a line passing through the data points and maps a continuous input to a continuous output.
    While implementing it our aim is the find out the values of θ0 and θ1 such that Mean Squared error of the distance between data points and the line is minimum. To find this optimum line we require a Cost function.
    ii. Cost Function
    Aim is to choose (θ0,θ1) such that the difference between hθ(x) and y is minimum.
    J(θ0,θ1)=1/2m * ∑(hθ(x(i))−y(i))2
    where, hθ(x) = θ1 + x.θ0
    To minimize the value of Cost Function, ie to find the optimal values of θ0 & θ1, we need to look at another algorithm called — Gradient Descent.

Notes:
1. Generally, for supervised learning we have a set of correct data.
2. We plot and try to find out a model/hypothesis
3. This model can be a linear/non-linear function. Right now I am learning a very basic univariate linear function which is a Line
4. The accuracy of the model determines, how much near the values of h(x) and y are on an average.
5. To find the optimum value for h(x), we need to tweak value of the constants θ0 & θ1 that could be achieved by other algorithm — “Gradient Descent”

PS. Access to all the articles here.