Machine Learning: Simple Linear Regression with React JS and Plotly

As developers, integrating data analysis and prediction models is crucial. Learn Simple Linear Regression, a fundamental machine learning technique.

Filip Jerga
Eincode
9 min readMay 22, 2024

--

Introduction

As developers, integrating data analysis and prediction models is crucial. Simple Linear Regression, a fundamental machine learning technique, helps us understand relationships between variables and make predictions. In this post, we’ll implement simple linear regression using React and Plotly, covering the mathematical background and providing step-by-step coding instructions.

Resources

Full Course: https://academy.eincode.com/courses/machine-learning-primer-with-js-regression

Full Code: https://github.com/Jerga99/ML-Regression-vol1

Mathematical Background of Simple Linear Regression

Simple Linear Regression is used to predict the value of a dependent variable (Y) based on the value of an independent variable (X). The relationship between X and Y is modeled by fitting a linear equation to observed data.

The linear equation can be represented as:

Where:

  • Y is the dependent variable (exam scores).
  • X is the independent variable (study hours).
  • b0​ is the intercept of the regression line (the value of Y when X is 0).
  • b1 is the slope of the regression line (the change in Y for a one-unit change in X).

Calculating the Parameters

To find the best-fitting line, we need to calculate b0​ and b1​ using the following formulas:

Where:

  • Xi​ and Yi​ are the individual sample points.
  • and are the means of X and Y, respectively.

Setting Up the Data

Let’s start by setting up our data in a React component. We’ll use two arrays to represent our study hours and corresponding exam scores.

const studyHoursData = [1, 2, 3, 4, 5];
const examScoresData = [55, 70, 80, 85, 90];

Calculating the Means

The next step is to calculate the means of our data sets.

const meanStudyHours = studyHoursData.reduce((sum, val) => sum + val, 0) / studyHoursData.length;
const meanExamScores = examScoresData.reduce((sum, val) => sum + val, 0) / examScoresData.length;

Building the React Component

Now, we’ll build a React component that will perform the regression calculations and display the results using Plotly for visualization.

import { useEffect, useState } from 'react';
import Plot from "react-plotly.js";
function ExamScorePrediction() {
const [regressionLine, setRegressionLine] = useState([]);
const [regressionParams, setRegressionParams] = useState({b0: 0, b1: 0});
const [inputHours, setInputHours] = useState("");
const [predictedScore, setPredictedScore] = useState(null);

const data = [{
x: studyHoursData,
y: examScoresData,
mode: "markers",
type: "scatter",
marker: { color: "blue" }
}, {
x: studyHoursData,
y: regressionLine,
mode: "lines",
type: "scatter",
name: "Regression Line",
line: {color: "red"}
}];

const layout = {
title: "Study hours vs Exam Scores",
xaxis: {
title: "Study hours",
autorange: true,
},
yaxis: {
title: "Exam scores",
autorange: true,
},
}

useEffect(() => {
if (inputHours === "") {
setPredictedScore(null);
} else if (parseFloat(inputHours) >= 0) {
const score = regressionParams.b0 + regressionParams.b1 * parseFloat(inputHours);
setPredictedScore(score <= 100 ? score.toFixed(2) : 100);
}
}, [inputHours, regressionParams]);

useEffect(() => {
trainModel();
}, []);

useEffect(() => {
if (regressionParams.b0 > 0 && regressionParams.b1 > 0) {
// Model Testing
const predictionsFromInputs = studyHoursData.map((x) => regressionParams.b0 + regressionParams.b1 * x);
const residuals = predictionsFromInputs.map((y, i) => examScoresData[i] - y);

const ssResiduals = residuals.reduce((sum, residual) => sum + Math.pow(residual, 2), 0);
const ssTotal = examScoresData.reduce((sum, score) => sum + Math.pow(score - meanExamScores, 2), 0);

const r2 = 1 - (ssResiduals / ssTotal);
const mae = residuals.reduce((sum, residual) => sum + Math.abs(residual), 0) / residuals.length;
const mse = residuals.reduce((sum, residual) => sum + Math.pow(residual, 2), 0) / residuals.length;

console.log(mse);
console.log(mae)
console.log(r2);
}
}, [regressionParams]);

const trainModel = () => {
// Step 1 - Compute means
// Step 2 - Compute Slope (B1, m)
const numerator = studyHoursData.reduce((sum, hour, i) => sum + (hour - meanStudyHours) * (examScoresData[i] - meanExamScores), 0);
const denominator = studyHoursData.reduce((sum, hour) => sum + Math.pow(hour - meanStudyHours, 2) ,0);
const b1 = numerator / denominator;
const b0 = meanExamScores - b1 * meanStudyHours;

const regressionYs = studyHoursData.map(x => b0 + b1 * x);
setRegressionLine(regressionYs);
setRegressionParams({b0, b1});
}


return (
<div>
<div style={{textAlign: "center"}}>
<input
type="number"
value={inputHours}
onChange={(e) => {
setInputHours(e.target.value);
}}
placeholder="Enter study hours"
style={{marginBottom: 10}}
/>
{ predictedScore &&
<div>
Predicted exam score: {predictedScore}
</div>
}
<div>b0: {regressionParams.b0}</div>
<div>b1: {regressionParams.b1}</div>
</div>
<Plot
style={{width: "100%", height: 500}}
data={data}
layout={layout}
/>
</div>
);
}

export default ExamScorePrediction;

Code Explanation

Component State and Data

  • State Variables:
  • regressionLine: Stores the y-values of the regression line.
  • regressionParams: Holds the slope (b1) and intercept (b0) of the regression line.
  • inputHours: Keeps track of the user's input for study hours.
  • predictedScore: Stores the predicted exam score based on the input hours.
  • Data for Plotly:
  • We create two objects in the data array: one for the scatter plot of the original data points and one for the regression line.

Layout for Plotly

  • Layout Object:
  • Defines the title and axis labels for the plot to provide a clear and informative visualization.

useEffect Hooks

  • Predicted Score Calculation:
  • This hook updates the predicted score whenever inputHours or regressionParams changes. If the input is valid, it calculates the predicted score using the regression equation and updates the state.
  • Model Training:
  • This hook calls the trainModel function to calculate the regression parameters when the component mounts.
  • Logging Metrics:
  • This hook calculates and logs regression metrics (MSE, MAE, R2) whenever the regression parameters are updated, helping us evaluate the model’s performance.

Train Model Function

  • Training the Model:
  • Calculates the slope (b1) and intercept (b0) using the formulas for simple linear regression.
  • Computes the y-values for the regression line and updates the state with these values and the regression parameters.

Explaining the trainModel Function

The trainModel function is the core part of the code where the simple linear regression model is trained. This involves calculating the slope (b1) and intercept (b0) of the regression line, which are then used to predict the dependent variable based on the independent variable. Let's break down each step of the function:

1. Compute Means

Before calculating the slope and intercept, we need the means of the independent (study hours) and dependent (exam scores) variables, which are precomputed as meanStudyHours and meanExamScores.

2. Compute the Slope (b1)

The slope (b1) indicates the change in the dependent variable for a one-unit change in the independent variable. It is calculated using the formula:

In the code:

const numerator = studyHoursData.reduce((sum, hour, i) => sum + (hour - meanStudyHours) * (examScoresData[i] - meanExamScores), 0);
  • Numerator Calculation:
  • This line calculates the sum of the product of the deviations of each data point from their respective means.
  • hour - meanStudyHours: The deviation of each study hour from the mean study hours.
  • examScoresData[i] - meanExamScores: The deviation of each exam score from the mean exam scores.
  • The product of these deviations is summed up for all data points.
const denominator = studyHoursData.reduce((sum, hour) => sum + Math.pow(hour - meanStudyHours, 2), 0);
  • Denominator Calculation:
  • This line calculates the sum of the squared deviations of each study hour from the mean study hours.
  • Math.pow(hour - meanStudyHours, 2): The square of the deviation of each study hour from the mean.
  • These squared deviations are summed up for all data points.
const b1 = numerator / denominator;
  • The slope b1 is then calculated as the ratio of the numerator to the denominator.

3. Compute the Intercept (b0)

The intercept (b0) is the value of the dependent variable when the independent variable is zero. It is calculated using the formula:

In the code:

const b0 = meanExamScores - b1 * meanStudyHours;
  • The intercept b0 is computed by subtracting the product of the slope (b1) and the mean of the independent variable (meanStudyHours) from the mean of the dependent variable (meanExamScores).

4. Compute the Regression Line Values

Using the slope (b1) and intercept (b0), we can now calculate the predicted values (y-values) for the regression line.

const regressionYs = studyHoursData.map(x => b0 + b1 * x);
  • Predicted Values:
  • This line computes the predicted exam scores (regressionYs) for each study hour in the dataset.
  • For each study hour x, the predicted value is calculated as b0+b1⋅x.

5. Update State

Finally, we update the state with the calculated regression line values and parameters.

setRegressionLine(regressionYs);
setRegressionParams({b0, b1});
  • setRegressionLine: Updates the state with the array of predicted values (regressionYs), which will be used to plot the regression line.
  • setRegressionParams: Updates the state with the calculated slope (b1) and intercept (b0).

Understanding R², MAE, and MSE in Simple Linear Regression

When evaluating the performance of a regression model, it’s crucial to understand how well the model’s predictions align with the actual data. Three common metrics used for this purpose are the R² (R-squared) value, the Mean Absolute Error (MAE), and the Mean Squared Error (MSE). Let’s break down these metrics and see how they are calculated and interpreted.

1. R-squared (R²)

R², also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variable. It ranges from 0 to 1, where:

  • 0 means that the model does not explain any of the variance in the dependent variable.
  • 1 means that the model explains all the variance in the dependent variable.

Calculation of R²

The formula for R² is:

Where:

  • SSresidual​ is the sum of squares of residuals, which measures the variance of the residuals (the difference between observed and predicted values).
  • SStotal​ is the total sum of squares, which measures the total variance in the observed data.

In the code:

const ssResiduals = residuals.reduce((sum, residual) => sum + Math.pow(residual, 2), 0);
const ssTotal = examScoresData.reduce((sum, score) => sum + Math.pow(score - meanExamScores, 2), 0);
const r2 = 1 - (ssResiduals / ssTotal);
  • ssResiduals is calculated by summing the squared residuals.
  • ssTotal is calculated by summing the squared differences between each observed value and the mean of the observed values.
  • r2 is then computed as 1 − (ssResiduals / ssTotal​).

2. Mean Absolute Error (MAE)

MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average over the absolute differences between the predicted and actual values. MAE gives an idea of how wrong the predictions were on average.

Calculation of MAE

The formula for MAE is:

Where:

  • n is the number of observations.
  • yi is the actual value.
  • ŷi​ is the predicted value.

In the code:

const mae = residuals.reduce((sum, residual) => sum + Math.abs(residual), 0) / residuals.length;
  • mae is calculated by summing the absolute values of the residuals and then dividing by the number of observations.

3. Mean Squared Error (MSE)

MSE measures the average of the squares of the errors — that is, the average squared difference between the estimated values and the actual value. It gives a sense of the variance of the prediction errors.

Calculation of MSE

The formula for MSE is:

In the code:

const mse = residuals.reduce((sum, residual) => sum + Math.pow(residual, 2), 0) / residuals.length;
  • mse is calculated by summing the squared residuals and then dividing by the number of observations.

Conclusion

By following this detailed guide, you now have a comprehensive understanding of how to implement simple linear regression in a React application using Plotly for visualization. We’ve covered the mathematical background, data setup, and step-by-step code explanation to ensure you grasp the core concepts and practical implementation.

Feel free to experiment with the code and dataset to gain a deeper understanding of how linear regression works and how it can be applied in different scenarios. As developers, integrating such statistical methods into our applications can significantly enhance data analysis and predictive capabilities. Happy coding!

--

--