# Linear Regression with Numpy & Scipy

## y = mx + b

For our example, let’s create the data set where ** y **is

**+**

*mx*

*b**.*

** x** will be a random normal distribution of

**N**= 200 with a standard deviation

**σ**(sigma) of 1 around a mean value

**μ**(mu) of 5.

Standard deviation ‘**σ**’ is the value expressing by how much the members of a group differ from the mean of the group.

The slope ‘** m**’ will be 3 and the intercept ‘

**’ will be 60.**

*b*importnumpyasnp

x = np.random.normal(5.0,1.0,200)# (mean, std. deviation, N)

m = 3

b = 60

y = m * (x + np.random.normal(0,0.2,200)) + b# add a std. deviation to get a more realistic data

**Normal distribution** or ‘**Gaussian**’:

importmatplotlib.pyplotasplt

plt.hist(x,50)

plt.show()

We can see above how the data is spread around the mean value by our normal distribution.

Let’s visualise our data.

plt.scatter(x,y)

plt.show()

### stats.linregress( )

Will give us the value of ** m** and

**, the**

*b***r_value**is used to determine how well our line is fitting the data.

**r-squared**will give us a value between 0 and 1, from bad to good fit.

fromscipyimportstats

slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)

print('Slope: ',slope,'\nIntercept: ',intercept)

Slope: 2.98104902278

Intercept: 60.1146144847

### r-squared :

r_value**2

0.96018831950537364

defpredict_y_for(x):

returnslope * x + intercept

plt.scatter(x,y)

plt.plot(x, predict_y_for(x), c='r')

plt.show()

### Variance and Standard deviation

Get the mean and standard deviation with Numpy

print('Mean: ',np.mean(x),'\nStandard deviation: ',np.std(x))

Mean: 5.04321665207

Standard deviation: 0.972660025762

The variance ‘**σ²**’is the average of the squared differences from the mean.

We can find the standard deviation ‘**σ**’ with the square root of our variance.

N = len(x)

mu = sum(n)/N

print('Mean: ',mu)

Mean: 5.04321665207

frommathimportsqrtdefcalc_std_dev(x):

N = len(x)

v = 0

forninx:

v += ((n-mu)**2)

pop_variance = v/N

sigma = sqrt(pop_variance)

returnsigma

print('Standard deviation: ',calc_std_dev(x))

Standard deviation: 0.9726600257624177

### N or N-1, population or sample

The population variance **σ²**, or the average of squared differences is defined by dividing the sum of squared differences by **N** when **N** = len(population).

σ² = ∑ (x-μ)² / N

The sample variance **S²** is defined by dividing the sum of squared differences by **N-1**, when **N** = len(sample), for example when working on a train set of data.

S² = ∑ (x-μ)² / N-1