# Linear Regression in Ruby

A bit of math and a bit of code, machine learning in ruby

Linear Regression is one of the simplest machine learning algorithms, but can still perform pretty well. More importantly, it is easy to pick up some of the core concepts of machine learning with it, without being distracted by the more complex algorithms out there.

# Linear Regression, a quick overview

Supervised: You are telling the algorithm the expected results when training it, you know the desired results basically.

Continuous value: Usually a value like prices, environmental values, in contrast to distinct values that would enable you to classify something. When your algorithm for example can say: There is a car on the picture, it outputs a distinct value. Right, sometimes it is easier to explain something by contrasting it with something that it is not :)

Ok, so the basic flow (and terminology) is as follows: You have some values x, which you will use to predict your y values. More specifically, x is often called your features, while y are your labels. For example, you are skynet and sending your terminators against those pesky humans. You have following features:

number of terminators: 10, number of humans: 1000, number of dogs: 10

Maybe skynet is interested in the resulting human losses. This would be your label.

[terminators: 10, humans: 1000, dogs: 10] -> human losses

# Hypothesis

Linear regression is pretty simple, which means that the hypothesis is too. All we do is assign a “weight” to each feature. That weight is being multiplied with the feature and all of those products are summed up.

The result is the value of your label.

Let’s assume we have following weights: terminator/weight: 200, humans/weight: -0.1, dogs/weight: -10

The resulting human losses: 1800

Hmm, that doesn’t sound like our current weights are doing a great job, can’t have more human losses than there are humans after all!(thanks to @mediafinger for pointing that out :) )

Since this is a supervised problem, we will already have a few feature values and known labels in the beginning. In our example skynet has maybe recorded 3 battles against humans and dogs and also the resulting human losses (I hope the example is not too gruesome - I started with this example and now I gotta stick with it. On the bright side, there are no dog losses!):

10 terminators, 1000 humans, 10 dogs: 900 human losses

5 terminators, 800 humans, 2 dogs: 600 human losses

12 terminators, 2500 humans, 3 dogs: 1800 human losses

Each of those rows are called instances.

I know I promised code and we haven’t written a single line yet! Stick with me a bit longer, I would like to introduce matrices to you. Did you know, ruby has matrices!

Anyways, the reason why I am mentioning matrices is, that the calculation to predict the labels for the whole dataset above, can be done with one matrix multiplication. Assuming we use the weights we picked before: 200, -0.1, -10

The result would be [1800, 900, 2120]. We are off quite a bit for all the battles.

Finally, some code!

A machine learning algorithm would use these instances to come up with the best values for the weights. The best values means: The labels predicted with these weights should be as close as possible to the recorded label. “As close as possible” is where the cost function comes in.

After we got the best weights, we can use them to predict unknown labels (skynet might be planning a large scale attack on the humans, sending in 3000 terminators against 19000 humans and 220 dogs. It would like to know its chances beforehand.).

# Cost Function

In linear regression we use the mean squared error as cost function. Scary math incoming!

Alright, let‘s go through this step by step:

theta: this is the vector with all your weights. J(theta) means: The cost when using these weights.

m: number of instances. In our example above that is 3 (rows).

error: predicted label – label

squared: why? Well, for one thing, you get rid of the sign if the error is negative. That simplifies the cost calculation.

Weird mathematical symbol in front of the error: sum. Summing up all the squared errors. In ruby that would be a call to reduce for example.

I hope it will become a bit more understandable once you see the ruby code for the cost function:

If we use this cost function to calculate the cost for our initial weights [200, -0.1, -10] we get the following result: 167066.67(rounded). This is the sum of all our Js (cost function).

Ok… and now what? There is only one thing left: We need to do something to reduce the cost, to get it as close to zero as possible. Which means, we need to find the optimal weights for our data.

# Normal Equation

Anyways, back to the normal equation: I will throw in here the formula and then a ruby implementation but not go into anymore detail, since it will require quite a bit of math to properly explain this. I would rather recommend you dig into gradient descent and try to figure out how to implement it in ruby if you want to go a bit deeper. See, I even got some homework for you ;)

( Gradient descent would use the cost function, in case you are wondering where it is actually used.)

The normal equation will give us the following weights: [5.0, 0.675, 17.5]. Time for a sanity check, go ahead, use these values to predict the labels. I think you will be at least slightly impressed! Now, don’t expect to always find such perfectly matching weights (btw I did just pick some random values when I was selecting the features and labels. Yes, imagine this, I did not use data from real, apocalyptic battles).

To summarize:

You “train your model”, which means, you minimize the cost function by finding the best weights, with gradient descent or the normal equation. You predict values with your hypothesis function. You can measure your predictions with your cost function.

Here is the whole code, together with some tests that reflect the things we were checking before:

# How can I use this?

Some great websites to find datasets:

https://www.kaggle.com/datasets

# What‘s next?

Now, that said, there are plenty of machine learning resources even for ruby:

The one online resource I would like to recommend the strongest tho, is the machine learning course on coursera. It will give you a great foundation for all things machine learning. If you want to get into machine learning, or even just want a good refresher on some concepts, that is the go to resource:

And there are some cool data science/ml/ai podcasts:

shout out to @mediafinger for finding an error in the text!

Love tinkering with rust and julia, digging into interesting data and pushing pixels like it’s the 80s

## More from Ömür Özkir

Love tinkering with rust and julia, digging into interesting data and pushing pixels like it’s the 80s