Machine Learning — I

sai harini — Mon, 12 Feb 2018 16:59:49 GMT

What is Machine Learning?

A computer program learns from its experience E w.r.t task T and some performance measuer P, if its performance on T as measured by P, improves with experience E. — Tom Mitchell

There are several types of learning algorithms. Supervised Learning and Unsupervised Learning are the 2 important types . Others: Reinforcement learning, recommender systems.

Supervised Learning :

For example, if you want to sell a house, and have collected all the data about the size of the houses and their prices. Now you have a house with some S sqft and want to know what might be its cost in the market. So, in the available data you might either draw a straight line or a curve which passes through the maximum data points on the graph constructed by considering houses size on X axis and prices on Y axis. Now, you find out the price of your house of S sqft by extending it on to the line or the curve. So the term supervised learning refers to the fact that we gave the algorithm a dataset where the right answers are known. The above example is also a regression problem i.e. we are trying to predict a continuous value which is the price of the house. If we are trying to predict a discrete value, then it comes under classification problem.

Unsupervised Learning :

In unsupervised learning, given a dataset we are not told what class a particular data point belongs to. Instead we just tell the algorithm, here is the dataset, can you find some structure in the data. So the algorithm may cluster the data into some groups which are in some way similar. Ex: clustering is used is in google news. Google news looks at thousands of new stories on the web and groups them into cohesive stories. Another example is Cocktail party problem where 2 persons try to talk simultaneously and the algorithm separates their audio. The algorithm doesnt know which audio is what, but just finds out some similarities and separates them.

Linear Regression :

Generally, as already mentioned above, when we are trying to predict multiple class labels, it is called regression. We have a training dataset where the right answers are known and we try to fit the model with a straight line to identify the class label for a given set of attributes i.e. our job is to learn from the data on how to predict the value. let m be the number of training examples in the dataset. x be the input variables/features. y be the output variable/target variable. (x,y) to denote a single training example. (x^i,y^i) for ith training example. Generally we give our training data to the learning algorithm and its job is to come up with a function that is the hypothesis(h) which takes the input of the test example and gives the output of the predicted class. So h is a function tha maps from x’s to y’s. So how do we represent h? h(x)=theta0 + theta1*x. This predics y as a straight line function. This is linear regression with one variable which is x here. Also called as univariate Linear regression.

Cost Function ( with Linear Regression example) :

Cost function helps us to figure how to fit the best possible straight line to our data. theta0 , theta1 in the hypothesis function are the parameters. Lets see how to choose these parameters. We should comeup with the values such that the hypothesis fits the data. So we make sure that the h(x) is as close as possible to y for our training example (x,y). The equation in the below image is squared error cost function. We minimize the difference between the actual answer(y) to that of the calculated answer by h(x).

An important point to remember is that the hypothesis is a function of x i.e with a constant theta, we calculate the expected output given an input value, where as a cost function is a function of theta where we have a different hypothesis for each theta. Our objective of linear regression is to minimize this cost function i.e finding a straight line that fits the data well. The output of the cost function is nothing but, if we choose this particular theta, how much are we far away from the actual answer. Now what we really want is an efficient algorithm that automatically calculates the thetas so that the cost function is minimum.

Gradient descent :

We have seen the cost function, now we will see gradient descent algorithm for minimizing the cost funtion J. This is not only used in linear regression, but is used for minimizing the cost function all over ML. Also, it not only minimizes cost function, it can minimize any function. The outline of this algorithm can be seen as 1. Start with some random thetas. 2. Start changing thetas to reduce J until we hopefully endup at a minimum. The algorithm is as follows: theta j = theta j - alpha * partial_derivative_of_J_wrt_theta_j. Do this for all the thetas and then update each one of them. This formula is repeated until convergence. Alpha is the learning rate which tells how big a step we take downhill with creating descent. So if alpha is v large, we take huge steps and if its v small then we take small baby steps down the hill. Gradient descent can converge even with constant learning rates because as we keep moving towards the local minima, the partial derivative value keep decreasing and so there is no need to decrease alpha over time.

Also, for the cost function of linear regression, there is only one minima i.e. the local minima and the global minima are the same beacuse the cost function for linear regression is always bowl shaped. This algorithm we saw till now is called Batch Gradient descent. Here each step of gradient descent uses all training examples ( We do sigma till m. The above formula).

Stories by sai harini on Medium

Machine Learning — I