Linear Regression…Huh?

Rohan Jagtap
The Startup
Published in
4 min readDec 11, 2020

What’s up readers! I’m going to be starting a medium trend to write an article on each concept I learn about my focus for the next few months — Artificial Intelligence — in Python. This will be the first of many to come so stay tuned for more insight and explanations!

If you're new to Artificial Intelligence, chances are you’ve heard the term “Linear Regression” being thrown around. Jeez..could they have picked a more intimidating name? Anyway, I’m here to tell you that it is in no way as complex as the name makes it out to be. Simply put, it is the equation for the slope of a line…and then some.

First of all, what is Linear Regression? This concept simply aims to answer if a line-of-best-fit can be drawn in a random amount of data to determine if a linear relation lies within. Essentially does the x(independent) variable have a direct impact on the y(dependent) variable.

When I was in 9th grade, the math course I took essentially revolved around the formula that forms the basis for much high mathematics. This legendary equation:

Where “b” is the y-intercept and “m” represents the slope of the line. We aim to find a quantitative relation between “x” and “y”.

The only major difference between this formula and the one used for linear regression is the symbol epsilon(ε). Within the context of Artificial Intelligence, this symbol is used to represent the verticle distance of any data point to the line-of-best-fit. The point of these programs is to reduce this value as much as possible over all the points so that you end up with the most accurate value possible.

Now that you’ve hopefully gotten your head around this concept, let’s see what this looks like in code. First thing first, ✨import statements✨. Let’s bring in all the libraries we will need:

Since we’ve imported everything that we will need, let's proceed to store the data within our code. Although you may have a .csv file containing your data, I just made up some data for the sake of this example and stored it in an array:

Now, we have the data (Woohoo 🎉), but we need to reshape it in a way that can be used for this graph. But what is shaping data anyway?

Here we see a block of data that would be of shape(3, 2). From this diagram we can that our block is 3x2 blocks of data. However, to make it a line we can reshape the data into (6,1). We still have the same data, but we’ve simply changed the way it is stored. So back to our example:

HOLD ON, I know the -1 doesn’t make sense…jeez. But the “-1” is used as a placeholder when we do not explicitly know how “long” our data is and I want the computer to figure it out (Since the next parameter is 1, and I have 9 pieces of data, it does not take a whole lotta processing power to figure out that the -1 is actually a 9 but whatever).

Next, I’ll make an object of the LinearRegression class and fit my data. This will essentially train my model to figure out the equation of my best fit line:

The final few steps are to use the prediction function to make a “y” value for each “x” value that I have and plot it. Then I will plot the predicted values on the plotted graph.

After all this setup, you should get a result similar to this:

BOOM! A linear regression graph! Congrats on your completing your first steps to becoming an A.I. developer. You have just learned the bare essentials of linear regression 🧑‍💻.

However, something is still missing…how exactly does the computer take into account the verticle distance for each point and make sure the line-of-best-fit is as accurate as possible? In future articles, I will cover the Least Square Method to first find the optimum line-of-best-fit, the R Squared Method to check how well the line fits with the given set of data, and how to optimize this line using Gradient Descent. All coming soon…hopefully.

Thanks for reading this article. My name is Rohan, I am a 16-year-old high school student learning about disruptive technologies and I’ve chosen to start with A.I. To reach me, contact me with my email or through my LinkedIn. I’d be more than glad to provide any insight or to learn insights that you may have. Additionally, I would appreciate it if you could join my monthly newsletter. Until the next article 👋!

--

--

Rohan Jagtap
The Startup

Hey everyone! My name is Rohan, a Third Year student at the University of Waterloo learning about Artificial Intelligence.