Learning Linear Regression using Numpy Python

Neha Kushwaha
Analytics Vidhya

--

Approach to implement Linear Regression algorithm using Numpy python. Must know before you start using inbuilt libraries to solve your data-set problem.

Here, I will take you through the basic level of implementation on Linear Regression Algorithm. If you know about theory aspect of this algorithm and looking for it’s implementation from scratch using python then this Article is for you. Even if you don’t know i recommend you to visit my earlier article on Detailed view on Linear Regression before you start coding.

Steps in nutshell :

  1. Hypothesis assumption, for which we are going to find relationship between a dependent /target variable(y) for one or more independent/predictor variables(x) on the training data set.
  2. Initialize the weights with random real numbers for the parameters of hypothesis.
  3. Initialize learning rate(α) and a tolerance value for slope moment and stopping condition respectively.
  4. Calculate the cost and gradients and update all model parameter weights simultaneously to reduce error between actual and predicted values.
  5. Keep updating weights value until you reach the stopping condition(tolerance value) which will give us the optimum solution and thus resulting in reduction of gap between actual and predicted target.
Let’s start Implementing

Part 1 : Creating a simulated Data-set and perform some EDA on it:

using numpy linspace function to create 100 points and then creating a hypothesis with the random generated data and then creating a 2-D array for data_x to ease our matrix multiplication.

Hypothesis : hθ(x) = θ1*x1 + θ0
Here, i am taking example for generating linear random data of :
data_y = 29 * data_x + 30 * np.random.rand(100,1)
Weights will be θ1 and θ0 which we will find through gradient descent method.

Part 2 : Plot the Data-set:

This code is to display one featured data plot, due to some randomness we see data doesn’t exactly lie on a line but it almost follow linear path.

Part 3: Create train, test data:

i am again using numpy to split the data as an example, but you can experiment train_test_split() function of sklearn.model_selection also.

Part 4: Gradient Calculation function

Now comes the main logic implementation of Linear Regression calculation of error, cost function, gradient descent calculation.

Part 5: Finding the weights of Hypothesis

This is the 5th step from nutshell defined in start of this article, where we initialize random weights, the alpha value and the stopping condition.

We keep updating the weights until we meet the tolerance condition and volla we arrived at our final weights.

Posting the final equation with updated weights!!!

Part 5: Actual Vs predicted Vs hypothesis plot

At last some picturization makes the understanding much better, so the blue dot are the training data while red dot represents the training set. And finally the green line is the hypothesis we found in form of final equation using which you can predict the new points.

This was simple illustration for Linear function, but what if your hypothesis is not linear. In that case with few changes in above code which is quite robust to handle even non-linear function.

I will give you some hint but will leave you to try it on.

In Part 1 : Replace data_y with a non-linear function let's say
data_y = np.sin(data_x) + 0.1 * np.power(data_x,2) + 0.5 * np.random.rand(100,1)
In Part 4 : Replace y_estimate and gradient with assumed hypothesis
y_estimate = (x**2).dot(w).flatten()
gradient = -(1.0/len(x)) * error.dot(x**2)
This is just an example, you can try your own linear and non-linear hypothesis and see the prediction. For above changes , the final plot will look like as shown below:
A plot of non-linear hypothesis

I Hope this article helped you to get better understanding of the Linear Regression algorithm practically. Will meet you seen with a new Algorithm Stay tuned!! Keep learning!! and Stay safe!! :)

--

--

Neha Kushwaha
Analytics Vidhya

Software engineer by profession ….Data science learner by passion!!!!