# Linear Regression with Numpy & Scipy

## y = mx + b

For our example, let’s create the data set where y is mx + b.

x will be a random normal distribution of N = 200 with a standard deviation σ (sigma) of 1 around a mean value μ (mu) of 5.

Standard deviation ‘σ’ is the value expressing by how much the members of a group differ from the mean of the group.

The slope ‘m’ will be 3 and the intercept ‘b’ will be 60.

`import numpy as npx = np.random.normal(5.0,1.0,200) # (mean, std. deviation, N)m = 3b = 60y = m * (x + np.random.normal(0,0.2,200)) + b # add a std. deviation to get a more realistic data`

Normal distribution or ‘Gaussian’:

`import matplotlib.pyplot as plt`
`plt.hist(x,50)plt.show()`

We can see above how the data is spread around the mean value by our normal distribution.

Let’s visualise our data.

`plt.scatter(x,y)plt.show()`

### stats.linregress( )

Will give us the value of m and b, the r_value is used to determine how well our line is fitting the data. r-squared will give us a value between 0 and 1, from bad to good fit.

`from scipy import statsslope, intercept, r_value, p_value, std_err = stats.linregress(x,y)print('Slope: ',slope,'\nIntercept: ',intercept)`
Slope: 2.98104902278
Intercept: 60.1146144847

### r-squared :

`r_value**2`
0.96018831950537364
`def predict_y_for(x):    return slope * x + interceptplt.scatter(x,y)plt.plot(x, predict_y_for(x), c='r')plt.show()`

### Variance and Standard deviation

Get the mean and standard deviation with Numpy

`print('Mean: ',np.mean(x),'\nStandard deviation: ',np.std(x))`
Mean: 5.04321665207
Standard deviation: 0.972660025762

The variance ‘σ²’is the average of the squared differences from the mean.
We can find the standard deviation ‘σ’ with the square root of our variance.

`N = len(x)mu = sum(n)/Nprint('Mean: ',mu)`
Mean: 5.04321665207
`from math import sqrtdef calc_std_dev(x):    N = len(x)    v = 0    for n in x:        v += ((n-mu)**2)    pop_variance = v/N    sigma = sqrt(pop_variance)    return sigmaprint('Standard deviation: ',calc_std_dev(x))`
Standard deviation: 0.9726600257624177

### N or N-1, population or sample

The population variance σ², or the average of squared differences is defined by dividing the sum of squared differences by N when N = len(population).

σ² = ∑ (x-μ)² / N

The sample variance is defined by dividing the sum of squared differences by N-1, when N = len(sample), for example when working on a train set of data.

S² = ∑ (x-μ)² / N-1