A Career in Data Science — Part 1 -Machine Learning — Linear Regression

6 min readNov 30, 2018

This is my first post on Machine Learning Concepts. If its a person viewing my blog for the first time here’s the link to my Introductory post, hope you enjoy the content on my blog, will be glad to see you on-board throughout this journey.

Beginning a Career in Data Science?

Are you fascinated by the capabilities Artificial Intelligence? or do you belong to the category of people who believe…

medium.com

Humans learn from their past experiences. Machines follow instructions given by humans. But, what if humans can train machines to learn from the past data?

Well, That’s called Machine Learning. Its something more than learning its more about understanding and reasoning.

The two main families of algorithms in predictive machine learning are Classification and Regression.

Classification answers questions of the form yes or no, for example : If this email is a spam or not ? Whether the given biometrics is authentic? or Whether the given image is of a dog, cat or wolf ?
Regression answers questions of the form “how much ?”. for example : How much does this boat cost ? How many seconds do you expect someone to read this post?

Suppose your friend gives you some coins of 3 different denominations. 1 Rupee, 1 Euro and 1 Dirham. Each coin has different weights 1 Rupee weighs 3.8 grams , 1 Euro weighs nearly 7.5 grams where as 1 Dirham weighs 4.3 grams. You are asked to predict the number of each coin based on the weights. In this model your weights become the feature of the coins and the currency is the label to be predicted.

When you feed in the data to your model it learns which feature is associated with which label, for example it will learn that if a coin weighs 3.8 grams then it is a 1 Rupee coin. Lets consider a new coin then the model will predict the currency of the new coin based on its weight. This is a subtle example of Supervised Learning method, where labelled data is used to train the model. Unsupervised Learning (Unlabeled data) and Reinforcement Learning (Reward based) are the two other Learning methods.

Simple Linear Regression :

Before we dive into the actual technique of Linear Regression, lets have some intuition.

Suppose, the following values of X and Y are given to you (1,1), (2,2), (3,3), (100,100), (20, 20). , what is the value of Y when X = 6.

The answer is : 6. Not very difficult, right ?

Now, let’s take a look at another example. Say you have the following pairs of X and Y. Can you calculate the value of Y, when X = 6 ?
(1,1), (2,4), (3,9), (100,10000), (20, 400)

The answer is : 36. Was this difficult ?

Let’s try to understand what exactly happened in the above examples. The first example exhibited the relationship between X and Y as Y = X. Similarly, in the second example, the relationship was Y = X*X.

Regression is usually termed as determining relationship(s) between two or more variables. For example, in the above two examples, X and Y are the variables. X is termed as the independent variable and Y is termed as the dependent variable. These models tend to be of the form “ y = mx+b ” with ‘m’ slope and ‘b’ as the y-intercept.

Finding the BEST Fitting line !

We now know about Linear Regression. But, how do we find out the best value for ‘m’ and ‘b’ ? There are infinite set of values for ‘m’ and ‘b’ to choose from. Choosing the accurate value is quite a tedious job indeed.

The most suitable values of ‘m’ and ‘b’ are the ones that produce the least error across all given X and Y.

How far a given point is away from the line is what is considered as Error. This measure of the distance from each point to the line needs to altered accordingly to change the orientation and position of the line and achieve the best fitting line. In other words, The Error needs to be reduced to attain an optimal solution for the equation of the best fitting line.

In General, We consider this error to either be :

1. Mean Squared Error :

2. Mean Absolute Error :

How to minimize the error efficiently ?

So, to minimize the error we are going to use the “ Gradient Descent ” technique. Lets have some basic intuition on this.

Suppose you are standing on top of a mountain. This mountain measures how big our error is and you want to descend from this mountain. In order to descend from this mountain we need to minimize our height(distance from the bottom of the mountain). Here, we are reducing the error of the given model by descending the mountain.

So descending from the mountain is equivalent to getting the line closer to the points. Now, if i wanted to descend from the mountain, i would look at all the possible directions in which we can walk down and find the one that makes us descend the most. After a few steps we finally reach the bottom of the mountain i.e. we find the line that best fits the set of given points. Thus, we’ve solved our problem and that is gradient descent.

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

**J(w) is the error & w is the weights assigned to the feature** | Image Courtesy : Madhu Sanjeevi, Medium

How does it decide on how much value descend ?

Draws the line(Tangent) from the given point.
Finds the slope of that line.
It identifies how much change is required by taking the partial derivative of the function with respective to the weights assigned to the feature
The change value will be multiplied with a variable called alpha (learning rate) alpha is usually set to 0.01. The lower the value, the slower we travel along the downward slope. While this might be a good idea (using a low learning rate) in terms of making sure that we do not miss any local minima, it could also mean that we’ll be taking a long time to converge — especially if we get stuck on a plateau region.
Subtracts this change value from the earlier value of weights to get new value for the weights .t