Implementation of Linear Regression

Part 3/3 in Linear Regression

Ridley Leisy
2 min readDec 11, 2019

Part 1/3: Linear Regression Intuition

Part 2/3: Linear Regression Derivation

We’ve built up our intuition, derived simple linear regression, now, let’s put it into practice. In this article, we’re running regression on two examples…

A random numpy array — simple example

The automobile dataset from UCI — predicting car prices

Leaning on our derivation, let’s create two functions to find the slope (B) and the intercept (A) of our line.

def get_b(x, y):
numerator = np.sum((x*y) - (y.mean()*x))
denominator = np.sum((x**2) - (x.mean()*x))
return numerator / denominator
def get_a(x, y):
return np.mean(y) - get_b(x, y)*np.mean(x)

Now that we have our functions defined, let’s dive into the random array.

Numpy Example

After initializing, our random x, y values are as follows…

[X,Y]
[0.8552671736547468, 0.06574905845928791]
[1.7857650704191752, 1.5009363865864078]
[3.331867782121395, 3.4630893010428965]
[6.2809897821643235, 4.615731400464071]
[7.236338602122297, 4.635285264478657]
[7.547147132975976, 5.717878577071596]
[8.364661657951926, 7.460871490839706]
[8.969944067662954, 7.531910515363959]
[9.079173339499562, 7.69118762903434]
[9.702895818675557, 7.892704932609882]

The data visualized

Points Plotted

Utilizing Our Regression Functions

b = get_b(x,y)
a = get_a(x,y)

Utilizing Sklearn’s Regression

model = LinearRegression()
model.fit(x.reshape(-1,1),y)

Our functions are just as accurate as Sklearns. Let’s see if that holds true for a real data set.

UCI Dataset

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-         databases/autos/imports-85.data',header=None,
usecols=[21,25],na_values=['?'])
df.columns = ['horsepower','price']

The data visualized

Utilizing Our Regression Functions

b_3 = get_b(x,y)
a_3 = get_a(x,y)

Utilizing Sklearn’s Regression

model = LinearRegression()
model.fit(x.reshape(-1,1), y)

Wrapping Up

Boom! The slopes and intercepts for both the real data and random are nearly identical. Pull up the Jupyter Notebook and poke around for yourself.

Jupyter Notebook

--

--