Multivariate Linear Regression

Dharti Dhami
4 min readNov 19, 2019

--

Let’s take an example where we want to predict house prices in a city. We will need historical data about the house sales to train a Machine learning model.

In the machine learning terminology, the information about the house is called features and the price prediction is called label.

When we have multiple features and we want to train a model that can predict the price given those features, we can use a multivariate linear regression. The model will have to learn the parameters(theta 0 to theta n) on the training dataset below such that if we want to predict the price for a house that is not sold yet, it can give us prediction that is closer to what it will get sold for.

Cost Function and Gradient Descent for Multivariate linear regression

Practical Ideas for making Gradient Descent work well

  1. Use feature scaling to help gradient descent converge faster. Get every feature between -1 and +1 range. It doesn’t have to be exactly in the -1 to +1 range but it should be close to that range.

2. In addition to dividing the feature by maximum value people sometimes use mean normalization.

3. If we plot the value of cost function after each iteration of gradient descent, we should see that it converges. Below is a graph that shows gradient descent is converging. The number of iterations that it can take for gradient descent to converge varies a lot.

Polynomial Regression

Let’s say we have a housing price data set that looks like below.

Then there are a few different models we might fit to this.

One thing we could do is fit a quadratic model. It doesn’t look like a straight line fits this data very well. But then we may decide that quadratic model doesn’t make sense because of a quadratic function, eventually this function comes back down and we don’t think housing prices should go down when the size goes up too high. So then maybe we might choose a different polynomial model and choose to use instead a cubic function, and where we have now a third-order term and we fit that where green line is a somewhat better fit to the data cause it doesn’t eventually come back down. We can do this using the machinery of multivariant linear regression. So just by choosing three features (one size, other square of size and third cube of size) and applying the machinery of linear regression, we can fit this model and end up with a cubic fit to the data.

Important thing here to note is feature scaling. So if the size of the house ranges from one to a thousand, then the size squared of the house will range from one to one million, and the third feature x cubed, will range from one two ten to the nine, and so these three features take on very different ranges of values, and it’s important to apply feature scaling for gradient descent to get them into comparable ranges of values.

Another reasonable choice here might be to say that the price of a house is theta zero plus theta one times the size, and then plus theta two times the square root of the size.

By having insight into the shape of a square root function, and, into the shape of the data, by choosing different features, we can sometimes get better models.

It seems a little bit bewildering, that with all these different feature choices, how do I decide what features to use. In subsequent posts, we will talk about some algorithms that will automatically choose what features are used, and automatically choose whether we want to fit a quadratic function, or a cubic function, or something else.

But, until then, we should know that we have a choice in what features to use, and by designing different features we can fit more complex functions on our data then just fitting a straight line to the data.

--

--

Dharti Dhami

Mom, Tech Enthusiast, Engineering lead @Youtube Music.