Machine Learning # 3 — Linear Regression (Ridge & Lasso Functions)

Göker Güner
Analytics Vidhya
Published in
6 min readMar 28, 2021

This post is the third article in my Machine Learning series that I have written to consolidate and share what I have learned. For the second article of the series that published on Analytics Vidhya: link

If you are not interested in the Turkish of this article, you can skip this paragraph.

Bu yazı, öğrendiklerimi pekiştirmek ve paylaşmak için kaleme aldığım Machine Learning yazı dizimin üçüncü yazısıdır. Serinin yayınlanan ikinci yazısı için: link

Bu yazının Türkçesi için: link

Source: https://unsplash.com/photos/oc9Mi40XY-0?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Hello, in the third notebook of my Machine Learning series, we will learn;

  • The concept of linear regression,
  • The concept of loss account and measurement methods,
  • The subtypes of linear regression.

Similar to the previous articles, we start by importing the libraries we will use again and examining the data.

In addition to the methods we mentioned in my previous articles, we also import the libraries that will allow us to call LinearRegression, Ridge, Lasso functions that we will meet in this article, and the mean_squared_error method we will use for error calculation.

We will use the dataset you can access from this link to examine the regression problem. The ‘Boston House Prices’ dataset is a very useful data set that you can often see in beginner level education.

Note: I would also recommend you to examine the Kaggle platform, from which we received the data set in this article, in more detail. It is a very rich Data Science & Machine Learning platform with trainings, data sets, competitions, sample notebooks.

We see that our dataset consists of a total of 404 elements, all of these values are numeric and non-null. In other words, we can continue to know our dataset without any preprocessing steps.

Our main goal is to estimate house prices using all the features we have, but let’s get acquainted with the Linear Regression concept by examining the effect of only the number of rooms feature on the house price as first.

Descriptions of the variables inside the data set.

From the Kaggle from which we received the data set, we can examine what the column names of the data set mean. medv variable, the house price variable on our target. In this step, respectively;

  • We delete our target variable from the input (X) variable of our data set,
  • We assign this target variable to the output (y) variable,
  • To examine the relationship between the number of rooms and the price of the house, we define the number of rooms column to a variable.

When we examine the size of the number of rooms, we see that the size is missing. It should be a dataframe consisting of 404 rows and 1 column so that we can work on it. For this, we get help from the reshape function as follows. Reshaping process has a critical importance especially in Deep Learning studies. When we come to these parts of the article series, we will talk more about this process.

We observe that the houses in our data set are mostly 6-roomed and generally, as the number of rooms increases, the price of the house increases.

Least Squares Method

We drew a line representing the relationship between these two variables. Our whole purpose in regression problems is to minimize the distance of the points in the data set from the line and thus obtain the closest line to the truth. In this example, our variable was the number of rooms, and our target was the price of the house. We can model this with a single variable line function.

If we show this equation as y = ax + b, we define the mathematical purpose of our Linear Regression model as determining the values of a and b that will draw the line that passes closest to the data points.

This is called the Least Squares Method.

Now, let’s include all the features instead of the number of rooms and re-code the guesswork processes we saw in previous notebooks.

Root Mean Square Error is one of the metrics used to measure the difference between the values predicted by a model (or estimator / predictor) and actual (observed) values. If this metric is 0, it means the model made no mistakes. Simply add the squares of the errors (losses), first average, then square root.

Let’s finish this part of the article by separating our clusters with cross-validation and taking the average of the results we obtained.

Ridge Function

Ridge regression is a regression variant using L2 regularization. Before we continue, if we briefly touch on why we need this;

Remember that our main goal is to determine the coefficients of our equation optimally (with the smallest RMSE value). Our model will determine the coefficient for each input variable separately. These coefficients can sometimes be very large, causing our model to overfit. In order to avoid such situations, we need to implement a regularization.

The hyperparameter we need to determine is the alpha value for both the Ridge and the Lasso functions, which we will mention shortly.

While the alpha value of 0 cannot solve our problem, it causes the model to overfit, while being too high causes underfitting, on the contrary.

Ridge effect is shown in this way in loss calculation.

Lasso(Least Absolute Shrinkage and Selection Operator) Function

There are 2 main differences from the Ridge function. It uses L1 regularization to arrange the coefficients, the another more important difference is that it can completely ignore some attributes because it takes the absolute values of the coefficients instead of the squares.

Lasso effect in loss calculation is shown in this way.

While the effect of the room number variable that we used to sample at the beginning of the article is clearly seen, we see that the coefficients of some features that the model finds insignificant are also equal to zero.

Finally, we end our article by including the GridSearch and pipeline we used in the previous articles and observing the results. See you in the next articles.

--

--