Q#101: Writing a simple linear regression function

Linear Regression

Note: I believe it should be free and readily available for everyone to gain value from data, hence I pledge to keep this series free regardless of how large it grows.

Write a function that will read in an arbitrary number of data points (data will be collected from user input in our solution) and return the line of best fit using linear regression without sklearn!

TRY IT YOURSELF

ANSWER

Back to the basics, you cannot avoid linear regression as a Data Scientist. More often than not it is the only solution you need.

Linear regression is a powerful tool in data science for modeling the relationship between two variables. In this blog post, we’ll guide you through the process of creating a Python function that reads an arbitrary number of data points collected from user input and returns the line of best fit using simple linear regression. The line of best fit has the form y=mx+b, where x is the explanatory variable, y is the dependent variable, m is the slope, and b is the intercept.

Here's the Python code for the linear regression function:

import numpy as np

def linear_regression(data_points):
x, y = zip(*data_points)
n = len(data_points)

sum_x = sum(x)
sum_y = sum(y)
sum_xy = sum(xi * yi for xi, yi in data_points)
sum_x_squared = sum(xi**2 for xi, _ in data_points)

# Calculate slope (m) and intercept (b)
m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x**2)
b = (sum_y - m * sum_x) / n

return m, b

# Example usage:
data_points = [(1, 2), (2, 4), (3, 5), (4, 4.5), (5, 5.5)]
slope, intercept = linear_regression(data_points)

print(f"Line of Best Fit: y = {slope}x + {intercept}")

Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!

Earn $25 and 4.60% APY for FREE through my referral at SoFi Bank Here

Tips and Donations

--

--