A gentle introduction to Linear Regression, The Dart way

Ilia Gyrdymov
5 min readJun 20, 2022

--

Hi all!

Last time I introduced you to several machine learning libraries written in Dart programming language through a simple usage example in LinearRegressor class. But what is linear regression? Let’s roll up our sleeves and try to dig in.

In simple words, regression is when one tries to predict a number depending on a set of variables (aka features, independent variables).

Imagine you have a task to predict ̶s̶t̶o̶c̶k̶ ̶p̶r̶i̶c̶e̶s̶ house prices — say, you want to sell your house, and you want to get a fair price for your premises. So the first approach to solving this task might be randomly guessing the value. Yeah, great. Maybe, someone is ok with this approach, but I think it’s a bad idea to sell your house for an overly low random price (in this case, you get too little money) or the opposite, extremely high price (in this case, no one will buy your house).

There should be a smarter way to predict house prices. We should choose a feature depending on which the price can change. For instance, this feature may be a distance in meters to the nearest metro station. Say, our house is 500 meters away from the metro station. So what does it give us? To be honest, nothing. At least, at this point. This information can be meaningful only in its multitude — we should collect several examples of houses with their prices and distances to the nearest metro stations. Ok, say we collected some information:

So, we can see that our possible house price falls between the first and the third record since 500 meters is greater than 250 meters and less than 800 meters. Ok, for instance, the cost can be $480000. This valuation is way better than the random guesswork but still guesswork. Can we improve our prediction? The answer is yes. Here the math comes to play. Just look more closely at the data. Does it remind you of something? I can give you a prompt:

It’s a system of linear equations! To better predict the price for an arbitrary distance, we should find the value of “x”. Thus, we can multiply the distance by the “x” and get the price! There are several ways to find “x”, but we can never find the exact value, unfortunately.

In machine learning, we call such “x” a “coefficient” or “weight” and usually denote it as “w” (stands for “weight”). Features, which is only one variable in our case (distance to the nearest metro station), we usually denote as “x”. Outcomes we characterize as “y”. In our example, we have only one feature and one coefficient.

Geometrically, by finding the coefficient, we are trying to find a slope of the line. That is why we call the algorithm “Linear regression”:

Y-axis are the price values; X-axis are the distance values.

Looking at the graph, we can make some predictions just by drawing a perpendicular line from the X-axis until it intersects our dashed line:

We see that the perpendicular to the X-axis line that starts at x=180 (180 meters away from the nearest metro station) intersects our predicting line at the Y coordinate, which is a bit greater than $500000 — it is our predicted price for a house which is 180 meters away from the nearest metro station.

The dashed line is our best guess. It can’t connect all the dots on the graph because of the nature of the task: we should find the line, which is the “trend” of the data. If we connect all the dots, the generalization ability of the line for new, previously unseen dots will be meagre.

As you can notice, our data is quite biased — all the dots are in the upper part of the graph. It means that finding just one coefficient isn’t enough. So we need to consider that. Let’s recall the equation of the line in two-dimensional space:

The “w_0” term is what we want to consider, meaning “how much our line is biased from the origin”.

In the ml_algo library, the first way to find the coefficient is the so-called “Closed-form solution”. You will know more about the algorithm behind the solution in my following articles — stay tuned. One can find the coefficient using a closed-form solution of the ml_algo library in the following way:

The last two instructions print the following:

Coefficients: (514310.0, -56.05095291137695)
Prediction:
DataFrame (1 x 1)
price
486284.53125

The first coefficient, 514310, is our bias coefficient (“w_0” term), and the second one, -56.05, is our feature (distance to the nearest metro station) coefficient (“w_1” term).

The sum of $486284 looks more accurate than our guess of $480000, which we made above.

Another way to find the coefficient is Gradient descent. It is an iterative method. We get an updated coefficient value on each iteration and check if the value is good enough. If we are ok with the coefficient value, we stop the algorithm. Otherwise, we repeat the iteration. In my following articles, you will know more about gradient descent — again, stay tuned.

The next point of our journey is the Ordinary Least Squares problem.

That’s pretty much it!

If you have any questions, you can reach me on Twitter.

Cheers :)

--

--

Ilia Gyrdymov

Frontend engineer (Dart, Vue/Vuex/Nuxt, React/Redux + Typescript) with an interest in Machine Learning, living in Cyprus