# Simple Linear Regression from Scratch Using Kotlin

In this tutorial, we’ll learn how to use Kotlin to train and test a simple linear regression model without any external library. Simple linear regression is the easiest model in machine learning and therefore is a great candidate, to begin with.

This article doesn’t use any external library, the goal is to write everything down from scratch to allow for a better understanding of the mechanics behind the scenes.

# Simple Linear Regression

a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variables. The adjective simplerefers to the fact that the outcome variable is related to a single predictor.

In other words, given a variable, the simple linear regression model is able to predict with more or less effectiveness the value of a variable linked to the input variable.

There are multiple examples of how simple linear regression can be used

• Number of children in household -> Liters of milk consumed
• Years of experience -> Salary
• IQ -> Job Performance
• etc.

# Data

## Independent Variable & Dependent Variable

In the schema hereunder, the independent variable is `x` while the dependent one is `y` .

The goal of the exercise is of course to get an approximation of the optimal values of β₀ & β₁ in the simple linear regression formula :

y = β₀ + β₁*x

In this formula, y is the dependent variable, x is the independent variable, β₀ is the constant (varying the position of our line on the y-axis) and β₁ is the coefficient of the independent variable (varying the slope of our line).

# Build & Train

`val xTrain = mutableListOf<Double>()val yTrain = mutableListOf<Double>()val trainFileName = "train.csv"File(trainFileName).forEachLine {    val split = it.split(",")    xTrain.add(split.toDouble())    yTrain.add(split.toDouble())}val xTest = mutableListOf<Double>()val yTest = mutableListOf<Double>()val testFileName = "test.csv"File(testFileName).forEachLine {    val split = it.split(",")    xTest.add(split.toDouble())    yTest.add(split.toDouble())}`

## Model

`val model = SimpleLinearRegressionModel(independentVariables = xTrain, dependentVariables = yTrain)`

I left out the code for `SimpleLinearRegressionModel` on purpose because we’ll discover it method by method, field by field. For now, we just need to understand that we’ve filled the two fields `independentVariables` & `dependentVariables` .

## Mean X & Mean Y

`private val meanX: Double = independentVariables.sum().div(independentVariables.count())private val meanY: Double = dependentVariables.sum().div(dependentVariables.count())`

## Variance & Covariance

β₁ = covariance / variance

For us to get the value of β₁, we’ll have to calculate both of those.

The variance can be defined as the sum of the squared difference of each independent variable minus their mean.

`private val variance: Double = independentVariables.stream().mapToDouble { (it - meanX).pow(2) }.sum()`

The way to calculate covariance requires a bit more code but still is quite manageable. It can be described as the sum of products of, for each point of the graph, the value of x — meanX and the value of y — mean Y.

Hope the code is easier to understand…

`private fun covariance(): Double {    var covariance = 0.0    for (i in 0 until independentVariables.size) {        val xPart = independentVariables[i] - meanX        val yPart = dependentVariables[i] - meanY        covariance += xPart * yPart    }    return covariance}`

## β₀ & β₁

For a reminder, their respective formulas are the followings

β₁ = covariance / variance

β₀ = meanY — (meanX * β₁)

`private val b1 = covariance.div(variance)private val b0 = meanY - b1 * meanX`

# Test

We’ll also calculate the to evaluate the precision of our model.

`fun test(xTest: List<Double>, yTest: List<Double>) {    var errorSum = 0.0    var sst = 0.0    var ssr = 0.0    for (i in 0 until xTest.count()) {        val x = xTest[i]        val y = yTest[i]        val yPred = predict(x)        errorSum += (yPred - y).pow(2)        sst += (y - meanY).pow(2)        ssr += (y - yPred).pow(2)    }    println("RMSE = " + Math.sqrt(errorSum.div(xTest.size)))    println("R² = " + (1 - (ssr / sst)))}fun predict(independantVariable: Double) = b0 + b1 * independantVariable`

Now that we have everything set up, our model prints the following results for `RMSE` & `R²`

`RMSE = 3.07130626802983R² = 0.9888226846629965`

Which is a great result for our model since the closer `R²` is to 1, the better and a `RMSE` of `3.071` in this case is more than OK.

# Conclusion

In the next articles, we’ll see how Multiple Linear Regression works, and introduce the concept of Gradient Descent to minimize errors of our model.

Written by

## Data Driven Investor

#### from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade