Regression:Machine Learning

Anoop Singh
5 min readMar 16, 2018

--

Regression models (both linear and non-linear) are used for predicting a real value, like salary for example. If your independent variable is time, then you are forecasting future values, otherwise your model is predicting present but unknown values. Regression technique vary from Linear Regression to SVR (Support Vector Regression) and Random Forests Regression.

Some Machine Learning Regression models:

  1. Simple Linear Regression
  2. Multiple Linear Regression
  3. Polynomial Regression
  4. Support Vector for Regression (SVR)
  5. Decision Tree Regression
  6. Random Forest Regression

Implementation of simple linear regression

Simple Linear regression is a very simple regression method but has proven to be very useful for a large number of situations.

Y = bo + b1*x

Here Y is Dependent variable and x is Independent Variable.

Year of Experience

For implement Simple Linear Regression in python we follow these steps

Step-1

First we import the dataset using this

dataset = pd.read_csv('Salary_Data.csv')

Step-2

Then we Splitting the dataset into the Training set and Test set.

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

here we give test_size according to us. But best test_size is 20% of the DataSet (For big DataSet).

Step-3

After this we create Fitting Simple Linear Regression to the Training set.

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Step-4

After this we Predicting the Test set results using

y_pred = regressor.predict(X_test)

Step-5

Then we Visualising the Training set results using

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Output:

Step-6

After this we can also Visualising the Test set results

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

here we not change regressor.predict(X_train) into regressor.predict(X_test) because once we fix a regressor its don’t need to change this value.

Output:

Here we find good predictable result compare to training set.

Implementation of Multiple Linear Regression

In the Simple Linear Regression we study only two variable,but in the Multiple Linear Regression we study two or more than two variable.Multiple linear regression is a statistical method of predicting or explaining a continuous variable as a linear combination of one or more variables.

Y = b0 + b1*x1+b2*x2+ — — — — — — — +bn*xn

Assumptions of Linear Regression:

  1. Linearity
  2. Homoscedasticity
  3. Multivariate normality
  4. Independence of errors
  5. Lack of Multicollinearity

Dummy Variable Trap: The Dummy Variable trap is a scenario in which the independent variables are multicollinear — a scenario in which two or more variables are highly correlated; in simple terms one variable can be predicted from the others.

STEP-1: We import library and dataset as like Simple Linear Regression.

STEP-2: After step-1 we encoding categorical data using LabelEncoder and OneHotEncoder.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()

STEP-3: After encoding categorical data dummy variable create we avoiding the Dummy Variable Trap using

X = X[:, 1:]

STEP-4: We Splitting the dataset into the Training set and Test set , and Fitting Multiple Linear Regression to the Training set as like Simple Linear Regression.Even here we use LinearRegression library for fitting.

 from sklearn.linear_model import LinearRegression

STEP-5: we use regressor that we create in fitting multiple linear regression for predicting the Test set results.

y_pred = regressor.predict(X_test)

Implementation of Polynomial Linear Regression

Polynomial Regression, is a special case of multiple linear regression that adds terms with degrees greater than one to the model. The real-world curvilinear relationship is captured when you transform the training data by adding polynomial terms, which are then fit in the same manner as in multiple linear regression.Here we implement polynomial linear regression for predict truth or bluffing related to salary of a employee.

General Form of Polynomial Linear Regression:

STEP:1

First we import libraries and dataset as like simple linear regression

Note:- Here we also want to compare between linear regression and polynomial linear regression.

Related DataSet:

DataSet of Position and related Salaries

STEP:2

Fitting Linear Regression to the dataset

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)

STEP:3

Fitting Polynomial Regression to the dataset

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

STEP:4 Visualising the Linear Regression results

plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'green')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Result:

STEP:5 Visualising the Polynomial Regression results

plt.scatter(X, y, color = ‘red’)
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = ‘blue’)
plt.title(‘Truth or Bluff (Polynomial Regression)’)
plt.xlabel(‘Position level’)
plt.ylabel(‘Salary’)
plt.show()

Result:

STEP:6 Visualising the Polynomial Regression results (for higher resolution and smoother curve)

X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = ‘red’)
plt.plot(X_grid, lin_reg_2.predict(poly_reg.fit_transform(X_grid)), color = ‘blue’)
plt.title(‘Truth or Bluff (Polynomial Regression)’)
plt.xlabel(‘Position level’)
plt.ylabel(‘Salary’)
plt.show()

Result:

STEP:7 Predicting a new result with Linear Regression

lin_reg.predict(6.5)

Result:

lin_reg.predict(6.5)
Out[8]: array([ 330378.78787879])

Predicting a new result with Polynomial Regression

lin_reg_2.predict(poly_reg.fit_transform(6.5))

Result:

lin_reg_2.predict(poly_reg.fit_transform(6.5))
Out[9]: array([ 158862.45265153])

So, Polynomial Linear Regression is best compare to Simple Linear Regression in this case.

You can access all code with DataSet here……

--

--

Anoop Singh

You have a right to perform your prescribed action, and not worry about the results