Linear Regression Model

Akshita Guru
5 min readFeb 11, 2024

--

We studied machine learning and its varieties in the previous article. You’re welcome to look here. Additionally, we studied supervised learning, which we will now explore in more detail.

In supervised learning, two categories of classification exist.

  • Regression
  • Classification

Linear regression Model comes under supervised learning. That just means fitting a straight line to your data. It’s probably the most widely used learning algorithm in the world today.

Let’s start with a problem that can be address using linear regression. Say you want to predict the price of a house based on the size of the house.

Here we have a graph where the horizontal axis is the size of the house in square feet, and the vertical axis is the price of a house in thousands of dollars.

Let’s go ahead and plot the data points for various houses in the dataset. Here each data point, each of these little crosses is a house with the size and the price that it most recently was sold for.

Now, let’s say you want to buy a house, how much do you think you can get for this house? This dataset might help you estimate the price you could get for it. You start by measuring the size of the house, and it turns out that the house is 1250 square feet. How much do you think this house could sell for?

One thing you could do this, you can build a linear regression model from this dataset. Your model will fit a straight line to the data. Based on this straight line fit to the data, you can see that the house is 1250 square feet, it will intersect the best fit line over here, and if you trace that to the vertical axis on the left, you can see the price is maybe around here, say about $220,000.

This is an example of what’s called a supervised learning model. We call this supervised learning because you are first training a model by giving a data that has right answers. This linear regression model is a particular type of supervised learning model. It’s called regression model because it predicts numbers as the output like prices in dollars.

In addition to visualizing this data as a plot here on the left, there’s one other way of looking at the data that would be useful, and that’s a data table here on the right.

The data comprises a set of inputs. This would be the size of the house, which is this column here. It also has outputs. You’re trying to predict the price, which is this column here. Notice that the horizontal and vertical axes correspond to these two columns, the size and the price.

If you have, say, 47 rows in this data table, then there are 47 of these little crosses on the plot of the left, each cross corresponding to one row of the table. For example, the first row of the table is a house with size, 2,104 square feet, so that’s around here, and this house is sold for $400,000 which is around here.

To indicate the single training example, we’re going to use lowercase m to refer it to the total number of training examples. The dataset that you just saw and that is used to train the model is called a training set. Note that your house is not in this dataset because it’s not yet sold, so no one knows what the price is.

To predict the price of your house, you first train your model to learn from the training set and that model can then predict your house price.

In Machine Learning, the standard notation to denote the input here is lowercase x, and we call this the input variable, is also called a feature or an input feature.

For example, for the first house in your training set, x is the size of the house, so x equals 2,104. The standard notation to denote

the output variable which you’re trying to predict, which is also sometimes called the target variable, is lowercase y.

Here, y is the price of the house, and for the first training example, this is equal to 400, so y equals 400.

The dataset has one row for each house and in this training set, there are 47 rows with each row representing a different training example. We’re going to use lowercase m to refer it to the total number of training examples, and so here m is equal to 47.

To indicate the single training example, we’re going to use the notation parentheses x, y. For the first training example, (x, y), this pair of numbers is (2104, 400).

Now we have a lot of different training examples. We have 47 of them in fact.

Here, you saw what a training set is like, as well as standard notation for describing this training set. Rest we’ll discuss in upcoming post.

  • There is a difference between classification and regression, in classification, there are only a small number of possible outputs. If your model is recognizing cats versus dogs, that’s two possible outputs, so there’s a discrete , finite set of possible outputs. We call it as classification problem , whereas in regression , there are infinitely many problem numbers that the model could output.

I hope you found this summary of supervised learning to be interesting. Thank you for reading, and keep checking back for our upcoming related topics.

You can connect me on the following:

Linkedin | GitHub | Medium | email : akshitaguru16@gmail.com

--

--