Linear Regression in Python WITHOUT Scikit-Learn

Tan Moy
We Are Orb
Published in
4 min readDec 10, 2017

--

After thinking a lot about how to present this article to fellow ML beginners, I have arrived at the conclusion that I can’t do a better job of explaining root concepts than the present masters. I won’t even try.

I will just tell you this: before we start implementing linear regression in python make sure you have watched the first two weeks of Andrew Ng’s Machine Learning Course.

Once you have watched the lectures and grokked the concepts, you should try to implement it yourself and should you need some help, well, that is exactly why this article exists :-)

With that said, let’s get started. The data set and code files are present here. I recommend using spyder as it’s got a fantastic variable viewer which jupyter notebook lacks.

Can’t resist. ¯\_(ツ)_/¯

Step 1. Import the libraries:

This is self explanatory. We just import numpy and matplotlib. I haven’t used pandas here but you can certainly do. Read this excellent article by

to get started with Pandas.

Step 2. Read the data and create matrices:

In the second line we slice the data set and save the first column as an array to X. reshape(-1,1) tells python to convert the array into a matrix with one coloumn. “-1” tells python to figure out the rows by itself. Then we create a array of ones and cocatenate it to the X matrix. Finally we create the y matrix. At this point if we plot the graph using,

plt.scatter(my_data[:, 0].reshape(-1,1), y)

We get:

A linear trend can be clearly seen :-)

Step 3. Set the hyper parameters:

Now we should define the hyper parameters, i.e the learning rate and the number of iterations. We can also define the initial theta values here.

In case you are wondering, theta values are the slope and intercept values of the line equation. i.e the values of m and c in the equation y = c + mx. In this case yhat = theta[0][0]+ theta[0][1]*x

Step 4. Create the cost function:

The computeCost function takes X,y and theta as parameters and computes the cost. The calculations inside the function are exactly what Andrew teaches in the class. Basically, “inner” calculates the dot product of X and theta raised to power two. Then they are summed up and divided by 2*length of X and returned.

What it means is that we find the difference between predicted values (we use line equation and theta values to predict yhat ) and the original y values (already in the data set i.e the y matrix) and sum them up. Then we find the average and return it. The returned value is the cost.

We can run the cost function now and it gives a very high cost. We have to reduce it. Somehow. (¬‿¬)

computeCost(X, y, theta) # outputs 319.40631589398157

Step 5. Create the Gradient Descent function:

Of course we are going to use Gradient Descent to minimize cost function.

Gradient Descent is the heart of this article and can certainly be tricky to grasp, so if you have not done it yet, now would be a good time to check out Andrew Ng’s coursera course. Andrew’s explanations are spot on. Once you grasp it, the code will make sense.

Basically what it does is it finds the optimum value for theta parameters so that the cost decreases.

Now we can run the gradient descent function and see what happens:

g, cost = gradientDescent(X, y, theta, alpha, iters)  
print(g, cost)

The above code outputs :

g = array([[ 1.03533399,  1.45914293]])
cost = 56.041973777981703

From “319.40631589398157” to “56.041973777981703” that is a huge decrease in cost.

Go on, change the hyper parameters, the theta values. See what happens. Play around. See if you can decrease the cost further.

Step 6. Another plot:

Did you understand the above code? What do you think x_vals is? And y_vals? Does it remind you of something? Line equation perhaps? Can you use this technique to predict any y value given the x value?

As you ponder these questions, take a look at what the above code outputs:

Ooh! The line of best fit…

So there you go. A complete linear regression algorithm from scratch. I wonder what happens when there are multiple features ¯\_(ツ)_/¯

But that’s a topic for another article.

Though I said I won’t explain the relevant concepts in this article, you can certainly post your doubts in the comments below or hit me up in twitter and I will try to clear them.

Show us some ❤ and 👏 and follow our publication for more awesome articles on data science from authors 👫 around the globe and beyond. Thanks for reading.

--

--