Linear Regression with Python

Data Details

Diwakar
Beer&Diapers.ai
4 min readMay 21, 2019

--

‘Avg. Area House Age’: Avg Age of Houses in same city

‘Avg. Area Number of Rooms’: Avg Number of Rooms for Houses in same city

‘Avg. Area Number of Bedrooms’: Avg Number of Bedrooms for Houses in same city

‘Area Population’: Population of city house is located in

‘Price’: Price that the house sold at

‘Address’: Address for the house

Check out the data & Import Libraries

Check out the Data

Exploratory Data Analysis

Training a Linear Regression Model

X and y arrays

Train Test Split

Creating and Training the Model Permalink

Model Evaluation

Interpreting the coefficients:

  • Holding all other features fixed, a 1 unit increase in Avg. Area Income is associated with an **increase of $21.52 **.
  • Holding all other features fixed, a 1 unit increase in Avg. Area House Age is associated with an **increase of $164883.28 **.
  • Holding all other features fixed, a 1 unit increase in Avg. Area Number of Rooms is associated with an **increase of $122368.67 **.
  • Holding all other features fixed, a 1 unit increase in Avg. Area Number of Bedrooms is associated with an **increase of $2233.80 **.
  • Holding all other features fixed, a 1 unit increase in Area Population is associated with an **increase of $15.15 **.

Predictions from our Model

Residual Histogram

Regression Evaluation Metrics

Here are three common evaluation metrics for regression problems:

Mean Absolute Error (MAE) is the mean of the absolute value of the errors:

1n∑i=1n∣yi−y^i∣\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|n1​i=1∑n​∣yi​−y^​i​∣

Mean Squared Error (MSE) is the mean of the squared errors:

1n∑i=1n(yi−y^i)2\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)²n1​i=1∑n​(yi​−y^​i​)2

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors:

1n∑i=1n(yi−y^i)2\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)²}n1​i=1∑n​(yi​−y^​i​)2​

Comparing these metrics:

  • MAE is the easiest to understand, because it’s the average error.
  • MSE is more popular than MAE, because MSE “punishes” larger errors, which tends to be useful in the real world.
  • RMSE is even more popular than MSE, because RMSE is interpretable in the “y” units.

All of these are loss functions, because we want to minimize them.

--

--