# Linear Regression

Single And Multiple Dependent Variables

Regression basically means when you want to predict a continuous data. Regression Analysis is a form of predictive modelling technique which investigates relation between dependent and independent variable.

When implementing linear regression of some dependent variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of predictors, you assume a linear relationship between 𝑦 and 𝐱: 𝑦 = θ₀ + θ₁𝑥₁ + ⋯ + θᵣ𝑥ᵣ + 𝜀. This equation is the **regression equation**. θ₀, θ₁, …, θᵣ are the **regression coefficients**, and 𝜀 is the **random error**.

It is a form of supervised learning as we will be given both x and y for our training purpose.

For instance we want to predict the marks of students given how many hours they study ( Single Feature) or when we predict price of house given many features like area ,proximity to market , hospital and many other features.

Our task is to find the best-fit line given many data points. The equation *y=mx+c** *where **y is our dependent variable** that is to be predicted with the help of **independent variable i.e. x.** Here our gradient/slope denoted as m -*is the change in y for a unit change in x along the line and intercept denoted as c* *is the value of y at the point where the line crosses the y axis.*

How do we learn parameters?

- We train our model using training data
- Then we learn an algorithm or hypothesis
- Using that hypothesis we we accomplish our goal of predicting.

H**ow can be Choose Best line?**

**There can be many possible lines and every line will be having different θ.**

θ=[ θ₀ θ₁ ……θᵣ] where θ₀ is c(intercept) while θ₁ is m(slope) and other θ values for other features.

Best fit can be attained by reducing the error(loss) which can be measured by difference between predicted and actual points.

We can use Mean Squared error to find the cost function.

So, our task will be to reduce this error(loss) function. We will represent it by J(θ).

# Gradient Descent

We will use Gradient descent method to minimize J(θ). Generally when we want to find minimum point we find derivative of that function. So in simple language Gradient Descent is iterative method of finding minima.

Gradient descentis a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of thegradient(or approximategradient) of the function at the current point, because this is the direction ofsteepest descent.

So we will take a *learning rate which is step size ***α **(it is a constant) and keep on subtracting it till we reach the minima after we reach minima partial derivative will become zero. Initial Value does not matter when function is convex. While in non-convex functions initial value will also play role.

Procedure:

- Initializing θ randomly or we can do this by 0
- Keep Updating θ till we get best fit.
- Apply gradient descent and get θ
- Apply hypothesis to get predicted values.

def hypothesis(x,theta):

y_ = 0.0

n = x.shape[0]

for i in range(n):

y_ += (theta[i]*x[i])

return y_def error(X,y,theta):

e=0.0

m = X.shape[0]

for i in range(m):

y_ = hypothesis(X[i],theta)

e += (y[i] - y_)**2

return e/mdef gradient(X,y,theta):

m,n = X.shape

grad = np.zeros((n,))

# for all values of j

for j in range(n):

# sum over all examples

for i in range(m):

y_ = hypothesis(X[i],theta)

grad[j] += (y_ - y[i])*X[i][j]

# out of loops

return grad/mdef gradient_descent(X,y,learning_rate=0.1,max_epochs=300):

m,n = X.shape

theta = np.zeros((n,))

error_list = []

for i in range(max_epochs):

e = error(X,y,theta)

error_list.append(e)

# Gradient Descent

grad = gradient(X,y,theta)

for j in range(n):

theta[j] = theta[j] - learning_rate*grad[j]

return theta,error_list

This is method is quite slow!! We can improve this by doing vectorization in our code.

def hypothesis(X,theta):

return np.dot(X,theta)def error(X,y,theta):

error =0.0

y_ = hypothesis(X,theta)

m = X.shape[0]

error = np.sum((y-y_)**2)

return error/m

def gradient(X,y,theta):

y_ = hypothesis(X,theta)

grad = np.dot(X.T,(y_-y))

m = X.shape[0]

return grad/mdef gradient_descent(X,y,learning_rate=0.1,max_iter=300):

n = X.shape[1]

theta = np.zeros((n,))

error_list = []

for i in range(max_iter):

e = error(X,y,theta)

error_list.append(e)

# Gradient Descent

grad = gradient(X,y,theta)

theta = theta - learning_rate*grad

return theta,error_list

Here we are performing same operations but using numpy functions which increases the speed of our code.

# R2 Score

What Is Goodness-of-Fit for a Linear Model? — R2Score

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.

R-squared is always between 0 and 100% :

- 0% indicates that the model explains none of the variability of the response data around its mean.
- 100% indicates that the model explains all the variability of the response data around its mean.

`def r2Score(y,y_):`

num = np.sum((y-y_)**2)

deno = np.sum((y-y.mean())**2)

score = (1-num/deno)

return score*100

In general, the higher the R-squared, the better the model fits your data.

Hope this article explaines you most of the linear regression.