Linear Regression with Numpy & Scipy
y = mx + b
For our example, let’s create the data set where y is mx + b.
x will be a random normal distribution of N = 200 with a standard deviation σ (sigma) of 1 around a mean value μ (mu) of 5.
Standard deviation ‘σ’ is the value expressing by how much the members of a group differ from the mean of the group.
The slope ‘m’ will be 3 and the intercept ‘b’ will be 60.
import numpy as np
x = np.random.normal(5.0,1.0,200) # (mean, std. deviation, N)
m = 3
b = 60
y = m * (x + np.random.normal(0,0.2,200)) + b # add a std. deviation to get a more realistic data
Normal distribution or ‘Gaussian’:
import matplotlib.pyplot as plt
We can see above how the data is spread around the mean value by our normal distribution.
Let’s visualise our data.
Will give us the value of m and b, the r_value is used to determine how well our line is fitting the data. r-squared will give us a value between 0 and 1, from bad to good fit.
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
print('Slope: ',slope,'\nIntercept: ',intercept)
return slope * x + intercept
plt.plot(x, predict_y_for(x), c='r')
Variance and Standard deviation
Get the mean and standard deviation with Numpy
print('Mean: ',np.mean(x),'\nStandard deviation: ',np.std(x))
Standard deviation: 0.972660025762
The variance ‘σ²’is the average of the squared differences from the mean.
We can find the standard deviation ‘σ’ with the square root of our variance.
N = len(x)
mu = sum(n)/N
from math import sqrt
N = len(x)
v = 0
for n in x:
v += ((n-mu)**2)
pop_variance = v/N
sigma = sqrt(pop_variance)
print('Standard deviation: ',calc_std_dev(x))
Standard deviation: 0.9726600257624177
N or N-1, population or sample
The population variance σ², or the average of squared differences is defined by dividing the sum of squared differences by N when N = len(population).
σ² = ∑ (x-μ)² / N
The sample variance S² is defined by dividing the sum of squared differences by N-1, when N = len(sample), for example when working on a train set of data.
S² = ∑ (x-μ)² / N-1